MD, MPH. Peds Hem/Onc. Data nerd using technology to improve healthcare for kids around the world. Love languages: graphs, references, Bayesian statistics
The practical lesson is to build for model churn. Assume better general models will keep arriving, then build infrastructure that can use them safely and keep responsibility where it belongs.
https://t.co/oQZl0q7PAi
That is why the bitter lesson being true right now is ok. It shifts the durable work toward the harness: governance, retrieval, citation standards, audit trails, escalation rules, input/output controls, and workflow design.
A lot of clinicians are stuck at the same point with AI:
“I see the potential, but what do I actually do with it?”
My answer is small: pick one workflow where verification is faster than generation.
The hard requirement is that you have to know what correct looks like.
If you ask for “something good,” you will get something plausible.
If you know the standards, you can verify.
That is where the leverage starts.
I think that over the next five years we are likely to see both substantial progress toward something like 'weak AGI', i.e. systems that can do most cognitive tasks humans can do, and growing diminishing returns to raw frontier model improvement in the economic sense.
The point isn't that scaling stops working but rather that (a) achieving each additional increment of capability at the model level will require disproportionately greater expenditure of compute, data, engineering effort, and capital; and (b) 'weak AGI' will probably come from the combination of strong models with scaffolding, tools, memory, retrieval, planning, decomposition, verification, and other system-level affordances around them.
As a result, deployment design and scaffolding becomes more important over time, not less. The old view that wrappers are disposable because the next model jump will wash them away seems naive. If frontier gains become more input-intensive, then the question increasingly becomes how much capability you can extract, route, verify, and compose from a model within a given budget.
Recent developments point in this direction too. What seems to matter is whether a given pipeline is structured to exploit capabilities well: assigning subproblems appropriately, using division of labour intelligently, and compensating for weaknesses with tools and process. It seems quite plausible that there's more alpha on the harness side at the moment, than in merely betting on scaling alone.
New high-effort article "Why Creativity Cannot Be Interpolated" co-written with Dr. Jeremy Michael Budd. Yes the name is a pun on the famous book by @kenneth0stanley!
The counterintuitive thesis (corollary of Kenneth's research):
- Intelligence and agency are orthogonal to creativity - and sometimes actively hostile to it.
- Genuine creativity is impossible without deep understanding and creativity without understanding is "slop".
The strangest property of LLMs: within a single frame they seem to comprehend so deeply, yet they possess no perspective of their own. Like the blind men and elephant parable, each report is accurate, yet none integrates. We call this "frame-dependent" understanding, and it will change how you think about AI creativity.
We started writing this 2 years ago, and this is our distilled understanding of AI creativity in 2026.
Apparently some researchers thought it was significant that early high performance and later high performance were negatively correlated
They didn't realize they were conditioning on a collider
Nor did the editors at Science, who published their claims anyway
Just incredible