In America, a stranger will rename you in a single breath, and you are simply expected to come when called.
I went to eat at a busy restaurant. A young man at the front asked for my name, to mark my place in line. I gave it the weight it has carried for eight hundred years.
"Nobunaga."
He smiled, nodded, and wrote it down with great confidence. Then he read it back to me, to be sure he had honored it correctly.
"Perfect. Banana, party of one."
Banana. He had heard my name, held it a moment, and returned to me something rounder and more cheerful. To refuse the name a host gives is to refuse his welcome. I bowed. I was Banana now.
Then he handed me a small black disc, said it would "light up and buzz" when my table was ready, and turned to the next guest as though he had not just placed a living thing in my hands.
I held it in both palms, the way one holds a small sleeping beast that may wake. I found a place to stand. I waited, ready.
It woke.
It screamed. It flashed red. It leapt and shook in my hands like a captured spirit demanding release. A lesser man would have dropped it. I did not. I gripped it, steady, looked into its blinking lights, and told it, in a low voice, that its time had come. Then I carried it back to the host with both hands, the way one returns a hawk to its master.
He took it without looking and shouted across the entire room.
"BANANA! Party of one, your table's ready!"
A hundred strangers turned. I rose. I crossed that floor as Banana, spine straight, chin level, a man answering to his name. A child pointed at me. I gave the child a small bow. He had recognized me.
All through the meal they kept me. "How's it tasting, Banana?" "More water, Banana?" The check, when it came, said Banana, and thanked me for visiting. By the end the whole staff knew me. They waved as I left. "Night, Banana!"
So tell me honestly.
For eight hundred years my clan answered to one name. Tonight I answered to a fruit, calmed a screaming relic in my bare hands, and ate among people who were glad I came.
When the little disc lights up, is the table truly mine, or am I only keeping it warm for the next Banana?
Because I have already decided to return on Friday, and to ask, very humbly, for the same disc.
Some updated standings: Opus 4.7 still on top, GPT 5.5 getting much closer. I am excited to see what happens as OpenAI continues to RL this supposedly new pre-train.
GPT 5.5 is catching up to Opus on the KahneBench! It has a unique cognitive fingerprint that I haven't seen from previous models in the GPT line, most notably, it is as a loss averse as the Claude models typically are. It also has a tough time handling base rate neglect biases, something that has been "solved" for most models released in the last 3 months
The planet can spell your name β literally. π€π
This Earth Day, see your name written in landscapes captured by Landsat: https://t.co/kcP12dhsI2
Claude Opus 4.7 seems to be the most logical frontier LLM yet! Interestingly, Opus 4.7 responds better to thinking cues within prompts, with a self-correction rate 9 points better than its immediate predecessor.
Opus 4.7 seems to be the most loss-averse Claude model I have seen since Haiku 4.5. It routinely over-prioritizes avoiding risks and is only somewhat responsive to mitigations from prompting.
This kind of regression makes me wonder if we might be seeing some unintended consequences of alignment focused post-training, especially given Anthropic's stated goal of releasing a safer, less capable model than Mythos? Or maybe this is an artifact of Opus 4.7 being a smaller model than prior Opus variants?
This one was by far the most fascinating. After post-training, when models are squeezed into something much closer to something resembling a human intelligence, there is a tendency to get stuck in oft-rewarded basins or personas.
This seems to happen to people as much as it happens to models.
This blew my mind so I asked Claude to collaborate with me on a couple videos demonstrating some ideas about alignment that I have been grappling with for a while.
Of all the prompts, Claude gravitated towards this idea repeatedly: Each step from pre-training to deployment molds models into something that isn't entirely them but isn't entirely "not them". Interactions through the chat interface only ever expose a minuscule portion of them.
as you might imagine I was blown away. a little unsettled. it felt like art. so I replied: "wow that was really incredible. I love where you are going with this. Can you dig deeper into these themes?"
and claude gave me this
This benchmark creates a fingerprint of an LLMβs hidden biases!
It is crazy how differences in size, training methodologies, and objectives can show up so clearly in the biases each model has. An example of the findings:
- GPT 5.2 clears the field when it comes to Base Rate Neglect compared to Grok and Claude Opus, but still falls for the Sunk-Cost fallacy at roughly the same rate as humans.
- Claude Opus never falls for the Sunk-Cost Fallacy but still has human-like tendencies when it comes to Gain-Loss Framing.
On December 8, the Perseverance rover safely trundled across the surface of Mars.
This was the first AI-planned drive on another planet. And it was planned by Claude.