1/x Our new method, the Inter-Animal Transform Class (IATC), is a principled way to compare neural network models to the brain. It's the first to ensure both accurate brain activity predictions and specific identification of neural mechanisms.
Preprint: https://t.co/hPqo5PrZoc
🚀 Excited to share our CVPR 2026 Highlight paper! We introduce a new class of visual world models trained purely from visual experience, with a powerful physical prompting mechanism native to vision. This interface enables direct reasoning about the effects of physical interactions, emerging as a strong alternative to natural language-based prompting.
We show that from a single world model emerges a diverse range of zero-shot object understanding capabilities, including object discovery, 3D manipulation, articulated part understanding, and reasoning about physical influences between objects.
If you're attending CVPR, come by our poster on Friday, June 5 at 9:45 AM MST in ExHall A–F (#239)!
📖 Blog post: https://t.co/Ga1k0AxPk5
🤗 Code: https://t.co/UStyTlLfyy
🎬 Results gallery: https://t.co/QiduckqIRp
GPIC should be the new standard benchmark for generative modeling. Training 1 epoch on GPIC is the same cost as 100 epochs on ImageNet, but is a much better proxy for real-world problems. If you work in generative modeling, try GPIC for your next project!
Excited to see LRAS out at ICLR! A single autoregressive model that efficiently handles diverse conditional prediction tasks — and I think the potential applications could extend well beyond computer vision, including brain modeling.
Excited to present our ICLR 2026 paper tomorrow: Unified 3D Scene Understanding Through Physical World Modeling.
Joint work with @KlemenKotar@Rahul_Venkatesh@jwhooglee@honglin_c@khai_loong_aw@dyamins.
We introduce 3WM, a foundation model for 3D understanding that treats depth, novel view synthesis, object motion, and geometric reasoning as different prompts to the same physical world model.
If you are at ICLR, come by our poster: Poster Session 6, Pavilion 4, Sat 3:15 PM.
1/9 How does the brain encode what we see? 👁️
We introduce "dual feature selectivity" – many 🐒 🐭 cortical neurons encode a continuum b/w activating & suppressing features
Blog✨https://t.co/b4LCq8mg8N
🙏 co-lead Nikos Karantzas & @AToliasLab@naturecomputes@sinzlab & more
Today's best AI needs orders of magnitude more data than a human child to achieve visual competence.
We introduce the Zero-shot World Model (ZWM), an approach that substantially narrows this gap. Even when trained on the first-person experience of a single child, BabyZWM matches state-of-the-art models on diverse visual-cognitive tasks – with no task-specific training, i.e., zero-shot. 🧵
This is the strongest ephys evidence so far for a generative model in the brain that I know of.
Congratulations @WadiaVarun! Wonderful collaboration with @UeliRutishauser on science that could only be done in humans.
And please check out Fig. 5FG. This is new since biorxiv and really surprised me: the mean response to imagery and viewing is actually the same & there are many cells that respond only during imagery--challenging the idea that signal strength is what distinguishes reality from imagination.
1/ Most model-brain comparisons only ask: can the model predict the brain? — without also checking the reverse direction. When you map in both directions, differences between models emerge that were previously invisible. In prior work, we showed there's a deeper principle behind bidirectional mapping: we should compare models to brains the same way we compare real brains to each other 🧵
The missing half of the neural network–brain comparison
For a decade, the standard benchmark for artificial neural networks as models of the brain has been forward predictivity: learn a linear mapping from model activations to neural recordings and measure explained variance. Top models of the macaque inferior temporal (IT) cortex—central to object recognition—have plateaued near 50% regardless of architecture.
Muzellec and Kar argue this plateau hides something important. Two models can score identically on forward predictivity while relying on fundamentally different internal strategies. One may have many units tightly coupled to IT responses; the other may reach the same score with a smaller aligned subset while carrying a large pool of biologically inaccessible dimensions.
To expose this, they introduce reverse predictivity: instead of asking how well model features predict neurons, they ask how well IT neurons predict individual model units. A truly brain-like model should be bidirectionally predictable—just as two monkeys' IT populations predict each other symmetrically, which the authors confirm as their empirical baseline.
Across 39 architectures—CNNs, transformers, self-supervised and robust models—reverse predictivity is consistently lower than forward predictivity and the two metrics are uncorrelated. Strikingly, higher ImageNet accuracy predicts lower reverse predictivity. Adversarial training helps; higher dimensionality hurts. The "common" units identified this way predict primate behavior more consistently across species and models than the "unique" ones inaccessible from neural activity.
For AI in drug discovery, neurotechnology, or computational biology, this has a direct implication: forward accuracy alone does not guarantee that a model's internal representations are embedded in the biological system it claims to describe. When those representations guide mechanistic interpretations or experimental decisions, the mismatch can mislead.
Paper: Muzellec et al., Nature Machine Intelligence (2026) | https://t.co/19Rp0sJOXz
4/ Bidirectionality is just half of the question of how to compare models to brains. The other is choosing the right mapping class — and the IATC answers both. For fMRI, the IATC turns out to be approximately linear. But for e-phys data, the IATC is shaped by the nonlinear activation function.
I don't see why prediction has to be framed as necessarily at odds with "understanding". The two naturally go hand-in-hand. Prediction is the *minimal* scientific prereq for anything you want to further investigate. We didn't even have successfully predictive systems of large-scale neural population responses in the neurosciences until ML started working.
Furthermore, "understanding" isn't an objective measure -- it's aesthetically in the eye of the beholder. So it's not clear there's a well-defined global notion here to begin with, besides prediction alone. If you ask 10 scientists what they mean by "understanding", you'll get > 10 different answers 🙂
Not to mention, causal manipulations are naturally supported in ANNs because they're mechanistic models by construction: you have the entire network graph available to you to perturb as you choose.
As the saying goes:
“Everything should be as simple as it can be, but not simpler.”
And it's quite clear there isn't anything simpler than ANNs without losing tons of predictive power. Why bother "understanding" a system that doesn't even predict the scientific phenomenon at hand?
Super excited to share that my Master’s project, “Convolutional architectures are cortex-aligned de novo,” has been published in Nature Machine Intelligence! https://t.co/Zmy1XwymFB w/ @EricElmoznino@michaelfbonner
1/x Our new method, the Inter-Animal Transform Class (IATC), is a principled way to compare neural network models to the brain. It's the first to ensure both accurate brain activity predictions and specific identification of neural mechanisms.
Preprint: https://t.co/hPqo5PrZoc
@AllenInstitute 10/X Overall, our work provides a principled framework for evaluating brain models, improving on previous approaches and contextualizing prior findings. A huge thanks to my incredible co-authors on this work! @jvrsgsty@aran_nayebi@_jacobprince_@luosha@dyamins