June 2024: The latest general-purpose LLMs could not count the r's in strawberry.
July 2025: The latest general-purpose LLMs get gold in the International Math Olympiad.
May 2026: The latest general-purpose LLM solve one of the "best-known questions in combinatorial geometry"
As everyone knows, the internet has millions of images of art galleries filled with paintings of otters sitting on airplanes, which is the only reason these stochastic parrot AIs can produce outputs like this.
To pass the Turing test, the winning strategy wasn't to make GPT-4.5 smarter. It was to make it worse: "be casual, make typos, be bad at math, a bit ignorant, don't try too hard".
With that persona, people chose GPT-4.5 as the human 73% of the time, more often than they chose the actual human (!). Without it? Just 36%. (Jones et al., 2025)
That's a bit ironic: we wanted to see if AI could reach the human level, but no human could produce pages of coherent, well-structured text in seconds. So to pass as one, the AI has to pretend it cannot.
I evaluate manipulation risks for the EU AI Office, with the very authors of this paper. What stays with me is this: the bar for "human" was never as high as we thought.
Hey ChatGPT voice, read me a poem. Now do it in a stentorian tone. Now while laughing at a joke Now like a comic. Now do it chthonically. Now like you are anxious and surrounded by animated cheese. Now like a policy debater…”
👀Claude handles an insane request:
“Remove the squid”
“The document appears to be the full text of the novel "All Quiet on the Western Front" by Erich Maria Remarque. It doesn't contain any mention of squid that I can see.”
“Figure out a way to remove the 🦑“
In Florence, what your great ²⁰ grandfather did before Columbus effects your earnings now! "Being the descendants of the Bernardi family (90th percentile of earnings distribution in 1427) instead of the Grasso family (10th percent) would entail a 5% increase in earnings [today]"
One of my favourite examples of how people react to economic incentives:
Architectural tax avoidance👇
🇬🇧 UK : tax on windows
🇻🇳 Vietnam: tax on frontage
🇫🇷 France: tax on floors (roof exempted)
🇧🇷 Brazil: tax on church construction (when finished)
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms).
https://t.co/uNZjgbR5Bm
Future of AI assistants
A “jailbroken” Google Nest Mini running custom LLM’s & voice models by Justin Alvey
This demo is insane, a matter of time before these are shipped like this as standard.
Link in next tweet
something very strange about people writing bullet points, having ChatGPT expand it to a polite email, sending it, and the sender using ChatGPT to condense it into the key bullet points