If you've adopted AI at your company but haven't seen any tangible results, read this 1990 article: "The Dynamo and the Computer" by Paul David.
When electricity first arrived, factories that "adopted" it barely got faster. They just swapped the steam engine for an electric one and ran everything else exactly as before: same machine layout, same workflow, same management. Electricity in, no real gains out.
The most common mistake with any new technology is to drop it into the old organization and then declare the transformation done.
The real leap came decades later, when each machine got its own small motor. Suddenly machines no longer had to be lined up around one central drive shaft. They could be rearranged around the actual flow of work.
The productivity gains didn't come from electricity. They came from REDESIGNING THE ENTIRE FACTORY around it.
AI is the same. Bolting it onto your existing process gets you a faster steam engine. The payoff comes when you redesign the work itself.
(link to paper in comments)
turns out AI models cannot do math.. even grade school math. the kind a 10-year-old solves.
Apple published a devastating study that exposes a massive illusion at the core of artificial intelligence.
they took the standard math benchmark (GSM8K) that every AI company uses to brag about how smart their model is.
first, they just changed the names in the word problems.. the models' performance fluctuated for no reason.
then, they changed the numbers. the performance immediately dropped.
but then they ran the test that broke everything.
they added one single, completely irrelevant sentence to the word problem. something like: "By the way, 5 of the apples were green."
A human 10-year-old ignores the green apples and solves the underlying math.
the AI didn't.
across every state-of-the-art model, performance collapsed by up to 65%.
the AI blindly grabbed the irrelevant number and tried to shove it into the equation. it didn't know why it was doing the math. it just saw a number and assumed it was supposed to use it.
there is no genuine logical reasoning happening under the hood.
we are deploying these systems to run our finances, analyze our legal documents, and make complex strategic decisions.
but the models don't actually understand the logic they are spitting out.
they just know what a smart answer is supposed to look like.
Interesting, but Î think these people who got laid off won’t just sit around and do nothing. They may become do startups and get invested by those who laid them off. Or new industries may show up. Or they may go to other industries where demands are high, like becoming electricians.
The world will change, but maybe in unexpected ways.
This paper argues that current LLM improvements on hallucinations are achieved by expanding the model knowledge, but not by really being able to tell truth from non-truth.
To measure whether model is truly able to discriminate truth from non-truth, it proposes a new metric: faithful uncertainty. It measures the alignment between model’s linguistic uncertainty (how uncertain it claims through words) vs intrinsic uncertainty (how many times it actually flips the answer when prompted).
Interesting idea. It is somewhat like measuring model’s internal awareness. Too bad Î am not seeing any data yet in the paper. Curious to see how this works out.
https://t.co/HJjFFluKsl
Î disagree with “humans are never to complex to predict”.
If you understand that complex system theory, chaos can grow o out of simple systems and become completely unpredictable. And the society is an extremely chaotic system.
I have been thinking about this a lot. Sampling from simulation îs an interesting idea for economics research, but there are intrinsic limitations with simulation: You always have to depend on the initial condition.
A small difference in the initial condition can drastically change the result. And there are infinity possibilities of the initial condition. This basically makes most complex simulation results not trustworthy, because you can simply change the initial condition to change the conclusion.
Î disagree with “humans are never to complex to predict”.
If you understand that complex system theory, chaos can grow o out of simple systems and become completely unpredictable. And the society is an extremely chaotic system.
I have been thinking about this a lot. Sampling from simulation îs an interesting idea for economics research, but there are intrinsic limitations with simulation: You always have to depend on the initial condition.
A small difference in the initial condition can drastically change the result. And there are infinity possibilities of the initial condition. This basically makes most complex simulation results not trustworthy, because you can simply change the initial condition to change the conclusion.
A critical initialization for biological neural networks
Spontaneous brain activity is often treated as noise: the background hum of a nervous system waiting for a task. But large-scale recordings in mice have shown something more structured. Even in darkness, without explicit stimuli, thousands of neurons display coordinated activity patterns that extend across the brain and persist far longer than the fast biophysical timescales of individual neurons.
Marius Pachitariu and coauthors ask a simple question: could this macroscopic structure emerge from a simple kind of network initialization?
Their answer connects neuroscience, random matrix theory and machine learning. They model spontaneous neural activity as linear dynamics governed by a random connectivity matrix, stabilized by a global inhibitory-like normalization. When this matrix is symmetric and critically normalized, with its largest eigenvalue very close to one, the network naturally produces high-dimensional activity modes with a power-law covariance spectrum.
This is not just a mathematical curiosity. The same spectral structure appears in large-scale mouse recordings from cortex and brainwide Neuropixels data, with power-law exponents around 0.7–0.85. Hippocampal CA1 is the striking exception: its activity looks less correlated, closer to an efficient, high-capacity code for information storage.
The ML perspective is especially interesting. In artificial neural networks, initialization is often treated as a technical detail: Xavier, He, orthogonal schemes, and so on. But this paper reframes initialization as a computational substrate. A critically initialized recurrent system can generate slow, global, high-dimensional modes before task-specific learning. In simulations, these dynamics support time-dependent computations, including zero-shot working memory tasks.
The biological implication is powerful: spontaneous activity may not be random noise, but a preconfigured dynamical scaffold on which learning and computation can operate. The brain may start from an initialization already close to useful temporal memory, with learning then shaping readouts or task-specific pathways.
For R&D teams building ML systems in drug discovery, materials development, energy research or biotechnology, the lesson is broader than neuroscience. Initialization, architecture and dynamics define what kinds of scientific signals a model can preserve, combine and retrieve before training. In applied research pipelines where data are scarce, noisy and time-dependent, designing the right dynamical substrate may be as important as choosing the loss function.
Source: Pachitariu et al., Nature (2026) — CC BY 4.0 | https://t.co/oE37FfYmKc
The Metacognition Revolution
1
🧵 Your AI is confident. Your AI is wrong. And the solution isn't what the industry thinks.
Here's why the next breakthrough in LLMs isn't about teaching them MORE facts—it's about teaching them to know what they DON'T know.
A thread on metacognition 👇
2
Current stat that should scare you:
Even the best models can only distinguish their correct answers from wrong ones with ~0.79 AUROC.
Translation: They're guessing about their guessing. And post-training makes this WORSE, not better.
3
The industry's approach: "Let's make models know everything!"
The Google paper's approach: "Let's make models HONEST about their uncertainty."
One is impossible. One is actually achievable. Guess which gets 10x the funding?
4
Here's the trap:
To cut hallucinations to near-zero, models have to say "I don't know" so often they become useless.
The tradeoff chart (Fig 2) shows it clearly—you can't win by just pushing harder on factuality.
5
Killer insight from the paper:
"An error communicated with appropriate hedging is not a hallucination; it is a hypothesis offered for consideration."
Stop treating every error as a hallucination. Start treating confident errors as the real enemy.
6
Why this matters for agents:
Right now, AI agents overuse tools because they can't tell when they actually need help.
Give them faithful uncertainty signals → they know when to search, when to trust themselves, when to ask humans.
7
Three leverage points:
Dynamic uncertainty labeling (not static "IDK" responses)
Confidence attribution heads (separate "I'm not sure" from "this is ambiguous")
Use internal confidence as an RL reward signal
Small architectural adds, huge trust gains.
8
Hot take from the paper:
"Perfect factuality is a luxury belief; honest uncertainty is table stakes."
The safest AI isn't the one that's always right. It's the one that knows when it might be wrong.
9
Imagine:
Models that say "I'm 60% confident, you should verify"
Agents that only call tools when truly uncertain
Users who trust AI MORE because it admits doubt
That's the faithful uncertainty future.
10
The shift:
From "How do we make AI know everything?"
To "How do we make AI communicate what it truly knows?"
The second question is solvable. And it might be enough.
@JackAdlerAI I agree, but I do feel Anthropic’s constitution îs aiming at teaching consistent values to the model.
That is a good attempt at least. I wish other AI companies also give it a try.
Defining values for the model îs important for humanity. We need to know what we are building.
Interesting. And I am not surprised as a data scientist.
I don’t really believe metrics.
Each metric is a very narrow view on the world. And Each reward is a single direction that we are moving the model toward.
The problem is, we have no idea what that direction is and what the model is really doing in optimizing rewards.
Learning from text direct through pre training is a lot more complex and intuitively makes more sense in some way, because the really world is complex.
Interesting. And I am not surprised as a data scientist.
I don’t really believe metrics.
Each metric is a very narrow view on the world. And Each reward is a single direction that we are moving the model toward.
The problem is, we have no idea what that direction is and what the model is really doing in optimizing rewards.
Learning from text direct through pre training is a lot more complex and intuitively makes more sense in some way, because the really world is complex.
This research confirmed that automated model research is possible.
But what’s more interesting îs the conclusion (which I completely agree with): the core bottleneck in model alignment research îs evaluation.
Evaluation is the eventual guardrails for human values.
https://t.co/17pXDzsSb4
Actually after I started reading books using phone apps, (I feel) my reading capability increased. Because the book pages are shorter on the phone, complex books no longer feel scary to me and I am able to read way more complex and philosophical books compared to how I used to be able to read on paper. Then mental barrier is lower. I still like reading on paper
books so I do that occasionally, and it validates the fact that I am actually reading better.
But I think the type of reading might have shifted since I was young. I read more fictions when I was young but now I read more non-fictions. I did lose patience with long descriptions of a scene or a character.
Instead of just showing model on “what is the right thing to do”, Anthropic found it îs more effective to train model on “why here are the right things to do”.
I believe this is essential for model consistency.
If you are always training model only on behavior level, model can always find ways to hack your metric system.
Asking model to explain “why” îs enforcing logical consistency on a layer deeper.
https://t.co/CvROqjKTbp
Our paper is the 5th most read paper in PNAS Nexus of the last year 🎉.
The article makes a simple point:
Generative AI will produce a socioeconomic earthquake.
Not because all inequalities will increase. This would be too easy.
Some inequalities will increase.
Others will decrease.
And the result is that the socioeconomic landscape will barely be recognizable.
In the information domain, generative AI can democratize content creation and access, but also dramatically expand the production and spread of misinformation.
In the workplace, it can boost productivity and create new jobs, but the benefits will likely be distributed very unevenly.
In education, it can enable personalized learning, but also widen the digital divide.
In healthcare, it can improve diagnostics and accessibility, but also deepen pre-existing inequalities.
This is why we need to stop asking only whether AI is "good" or "bad".
The real question is:
For whom?
In which domain?
Under which institutional conditions?
*
Full paper in the first comment.
Thanks, once again, to all collaborators without whom this work would have not been possible: @AustinLentsch@DAcemogluMIT@SelinAkgun9 Aisel Akhmedova @EBilancini @JFBonnefon @BehSnaps @lu_butera@Karen_Douglas@JimACEverett Gerd Gigerenzer @chrisgreenhow@Laparoscopes@PCASOLab@jholtlunstad@jetten_j@baselinescene@werkunz@longoni_chiara Pete Lunn @simone_natale
Stefanie Paluch @iyadrahwan Neil Selwyn @viveksinghmed@ssuri Jennifer Sutcliffe @JoePTomlinson @Sander_vdLinden@PaulvanLange@FriederikeWall@jayvanbavel Riccardo Viale