For builders shipping agents into operator workflows:
What does your system do when it's about to act and the request is ambiguous?
If the answer is "it acts on its best guess," you know how this pilot ends.
https://t.co/6U8ifSgoiI
"The next agent will be better at this — just wait."
Reasonable. The next agent often picks better.
But being better at GUESSING what you wanted done is not the same as KNOWING what you wanted done.
90% reliability isn't reliability. It's a steady drip of incidents.
Operators see this version before individual users do.
The action lands in front of a real person. Their reaction becomes the next state of the world the system reads from. The next decision gets made on top of a reality the agent shaped without knowing it shaped it.
The agent didn't fabricate a fact.
It silently picked an interpretation of what to do next and acted on it.
Same move as last week's hallucination problem. Different surface. Different cost.
When the commitment is an action, you can't argue with it. It already happened.
A long-term-care nurse messages an operator's AI: a rider's daughter mentioned mom felt dizzy after Tuesday's trip.
The AI decides this is a driver complaint. Sends a link to the formal complaint form. Closes the case.
Confidently. Capably. Wrong.
The fall happens Friday.
Last week I argued the hallucination problem is a problem of meaning, not correctness.
This week — the same problem, different surface.
When the silent commitment is an action you can't take back.
Building AI into healthcare, legal, finance, or transportation? The honest diagnostic before your next pilot is one question:
What does your system do when the question is ambiguous?
If the answer is "the model guesses" — you know how it ends.
There's a cognitive move human experts do automatically — resolving meaning before speaking — that AI systems don't.
It's what I've worked on for years, and a tweet thread isn't the place.
But notice: the move is missing. From nearly every system shipping today.
Bigger models won't fix this.
Being better at GUESSING what you meant isn't the same as KNOWING what you meant.
A model right 9/10 times, at scale, dispatches the wrong ambulance or mis-advises the wrong patient.
90% isn't reliability. It's a distribution of disasters.
That's the hallucination problem in miniature.
The model didn't fail at correctness. It failed at meaning.
It silently picked one interpretation of an ambiguous question — the statistically average one — and answered THAT with full confidence.
Different question. Same answer.
A few years ago my chatbot told a user to "contact the platform administrator to open your account."
Statistically average across SaaS. Confidently wrong for our product, where employers open accounts, not users.
No fabricated facts. Just a different question answered.
Hot take: the AI industry has been chasing the wrong bug.
We call it "hallucination." We measure it on benchmarks. We bolt on "grounded modes." We celebrate every new factual-accuracy point.
None of it touches the actual problem.