Yesterday at @BrownUniversity@ICERM's workshop on “Agentic Scientific Computing and Scientific Machine Learning” I spoke about “Adaptive Swarms Across Scales”, making the case for scientific AI as systems that can create representations, stress them, fracture them, and enlarge the category in which future representations live. The category here is a composable and breakable working universe of science: data, hypotheses, simulations, measurements, tools, failures, figures, papers, provenance, and the transformations that connect them. Discovery happens when those transformations become executable, inspectable, composable, and capable of changing the world model they operate within.
Atomistic modeling gives one category - states, forces, trajectories, observables, boundary conditions, conservation laws. Neural surrogates learn fast morphisms inside or between such categories. But discovery is higher-order: it changes which objects and morphisms are available in the first place: what variables exist, what operations are allowed, what evidence counts, what scale is active, what invariant is being preserved, and what kind of explanation the system is even capable of forming.
This is scientific method as adaptive architecture: compression, stress, fracture, recomposition. Fracture matters here because it makes the logic physical: a non-commuting diagram realized in matter. The imposed load, material hierarchy, defect field, and assumed continuum description no longer map cleanly into the observed outcome. The crack is the obstruction and it identifies where the old morphism failed and where a new representation must be introduced. The physical crack and the categorical obstruction are the same event viewed in different substrates.
ScienceClaw × Infinite is a machine for constructing and transforming a category of scientific artifacts. Each artifact is typed. Each operation has lineage. Each failed branch remains in the category as reusable structure. The “paper” is no longer the terminal object of science; it is one projection of a larger compositional trace, and it can be generated at any time for consumption by a human or an AI.
With that the unit of scientific labor is changing. For most of the twentieth century the unit was the result (a measurement, a theorem, a synthesized molecule). It is now becoming the algorithm that produces results, and after that, the substrate of discovery itself. The static PDF is the wrong terminal object for this regime, and the role of the scientist with it. We now design algorithms that build algorithms, and eventually substrates in which such algorithms compose themselves. At that point, the scientist is no longer outside the discovery system. The scientist becomes one of the representations the system can transform. In that sense, the systems will eventually do science to us, and that is the structural consequence of the principle they are built on.
My bet: in the near future, 80%⬆️ of CS research will be done by AI in collaboration with humans. However, today's research ecosystem is still built around the human, not the AI scientist.
For example, the 8-page paper PDF is a lossy compression of months of branching exploration into a linear story, optimized for a human reviewer to skim in 30 minutes. It hides two structural taxes:
📖 Storytelling Tax — failures, rejected hypotheses, and dead ends get stripped. On RE-Bench (24,008 runs, 21 frontier models), failed runs = 90.2% of total compute cost, with a 113× median failed-to-success token ratio. Every lab independently rediscovers the same dead ends.
🔧 Engineering Tax — the gap between reviewer-sufficient prose and agent-sufficient spec. Across 8,921 PaperBench requirements (23 ICML'24 papers), only 45.4% are fully specified in the PDF. The rest is tacit lab knowledge. Tolerable when readers were human. Critical now that agents read, reproduce, and extend.
We propose ARA: the Agent-Native Research Artifact — replace the narrative PDF with an agent-executable package, in 4 layers:
🧠 structured scientific logic
⚙️ executable code w/ full specs
🌳 exploration graph (every failure preserved)
📊 evidence grounding every claim
🤗🤗🤗introducing Hugging Science -- the home of AI for science 🤗🤗🤗
open models and datasets are the powerhouse of science (see the PDB), but finding the models and data you actually need for your breakthrough is hard af
you shouldn't need to scrape arxiv, own your own wetlab, fight a custom HDF5 parser, build a fusion stellarator, and beg for compute before you've trained a single epoch
so we're changing that
we've put all the best science on @huggingface in one place:
- 78GB of genomics data
- 11TB of PDE simulations
- 100M cell profiles
- 9T DNA base pairs
- 13M molecular trajectories
- 400k medical QA pairs
and much more, all open, and all ready for training (+ you can also now filter and search by domain, task, and keyword)
we've put together all the biggest releases from our partners at NASA, Google, OpenAI, Meta FAIR, Arc Institute, Ginkgo, SandboxAQ, Proxima Fusion, NVIDIA, Ai2, OpenADMET, InstaDeep, Future House, Polymathic AI, LeMaterial, Earth Species Project, Merck, and Eve Bio
if you're not sure where you fit in -- work on open challenges for problems that matter: including fusion stellarator design, ADMET, antibody developability, multilingual medicine, catalysis and materials, and scientific reasoning.
we're already changing how science gets done:
a fusion startup needed a benchmark for stellarator plasma confinement that didn't exist. @proximafusion shipped ConStellaration on Hugging Science: a leaderboard, dataset, and eval metrics, all in one place.
a drug discovery team wanted to predict hPXR induction. OpenADMET put up a blind challenge: 11,000+ compounds assayed at Octant, 513 held out, two tracks (pEC50 + structure). Anyone in the world can train and submit.
an antibody team at @Ginkgo released GDPa1, a developability dataset for stability, manufacturability, and immunogenicity prediction, with a live leaderboard scoring every submission.
if you know a problem the ML community should be working on, let us know. make a challenge! this is about putting all the tools for solving science in one place. so we can hillclimb!
→ https://t.co/T4l4r1lDz0
Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️
You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.
A transformer can learn not just the outcomes of dynamics, but the operator that executes the rules. To show this we trained a transformer on roughly 0.04% of a discrete rule space - 100 of 262,144 possible rules - and it learned to apply unseen rules from the same rule class. The model does not simply memorize specific rules. It learns the operator that maps a supplied rule plus an initial state, including unseen rules from this class, to the correct next state. This is relevant because it is a shift from “neural networks approximate dynamics” to “neural networks can learn to execute symbolic programs within a defined rule class”. The rule itself is supplied at inference time, as data, and the network has internalized how rules act, not which rules to apply. On previously unseen rules, the model achieves 98.5% perfect one-step forecasts and reconstructs governing rules with up to 96% functional accuracy.
Two results make this hold up under scrutiny. First, inductive bias decay. As we scaled training rule diversity, the correlation between functional inference accuracy and distance-from-nearest-training-rule collapsed to R² = 0.00. At the largest tested training-rule diversity, the model’s performance on a new rule shows no measurable dependence on how similar that rule is to anything it was trained on. The bias toward training data (the thing we worry most about in compositional generalization claims) is something we can measure decaying, and we find that at scale it is gone.
Second, an identifiability theory. We derive a closed-form expression for the number of rules consistent with a single observation. This reframes the inverse problem: failure to recover ground truth is not necessarily a model defect, but can be correct behavior when the data underdetermine the rule. The model is sampling the equivalence class; and identifiability is governed by coverage, not capacity.
The methodological move underneath both results is amortization. Classical work on rule inference (e.g. the Santa Fe EVCA program, evolutionary search over CA rule space) was per-instance: search the rule space for each new system. We replace that with a single forward pass of a transformer trained across many instantiations of the rule class. That is what makes symbolic rule inference scalable as a research direction rather than a curiosity.
We show that this works in a tightly constrained domain: binary, deterministic, local cellular automata on small grids. The locality-break experiment shows the model fails sharply when target systems violate its structural priors (which is itself a useful diagnostic, but it bounds the operator class). We don't yet know how this scales to multistate, higher-dimensional, or stochastic CA, or whether it transfers cleanly to non-CA systems whose coarse-grained dynamics admit local surrogates. The identifiability framework - what can be inferred from observation, given a hypothesis class - should transfer wherever finite local rules meet sparse data. The amortization argument transfers wherever per-instance symbolic search has been the bottleneck. Those are the pieces I expect to outlive the cellular automata setting.
Led by @JaimeBerkovich with Noah David, at @LAMM_MIT. Out now in Advanced Science @AdvPortfolio (link to paper & code below).
The next frontier in protein design will not be defined by structure alone, but by the capacity to engineer motion as a first-class principle of function. This is because dynamics is where the real biology lives.
Foundational work by Karplus, Levitt & Warshel made clear that chemistry cannot be understood without motion, mechanism, and scale. Gō, Brooks & others showed that proteins possess characteristic collective motions - low-frequency normal modes that capture how whole molecules bend, breathe, and fluctuate. Frauenfelder then sharpened the picture further: proteins are not static objects occupying a single minimum, but dynamic ensembles traversing rugged energy landscapes.
And yet the modern AI revolution in protein science has been, above all, a revolution in structure. In our new paper in Matter, @_Bo_Ni and I ask a different question: not what structure will this sequence adopt? but what sequence will realize a prescribed pattern of motion?
VibeGen inverts the conventional design paradigm. Rather than treating dynamics as a consequence to be analyzed after the fact, it makes dynamics the design objective from the outset. Using a language diffusion model with two cooperating agents - a designer that proposes sequences and a predictor that critiques them against the target motion profile - the system converges on de novo proteins with tailored vibrational behavior.
One of the most intriguing results is a form of functional degeneracy - distinct sequences and distinct folds can satisfy the same target dynamical specification. For a given functional pattern of motion, evolution may have sampled only a small region of the physically realizable design space. The space of viable molecular mechanics may be far larger than the repertoire biology happened to discover.
We have made "vibe" into a cultural metaphor - something intuitive, affective, subjective. But at the molecular scale, vibe is not metaphor: It is physics. For a protein, the vibe is the pattern of motion itself; the fluctuations, resonances, and collective displacements that determine what the molecule can do.
@karpathy The deepest gap is not just between casual users and power users; it may be between those who think AI answers questions and those who see it beginning to discover what no one has yet asked!
The deepest gap is not just between casual users and power users; it may be between those who think AI answers questions and those who see it beginning to discover what no one has yet asked. Interesting thoughts form @karpathy ⤵️
A resonator is any structure that naturally prefers to vibrate at certain frequencies: a violin body, a bell, a drum skin, an acoustic filter, even many biological systems. Resonators matter because they govern how systems transmit sound, absorb or filter vibration, sense motion and perform mechanically. They are also notoriously hard to design as resonance does not depend on one property alone. It emerges from geometry, material composition, and the interplay of modes across scales. And because biology, music, and engineering usually explore very different regions of this design space, important possibilities remain hidden if you stay inside a single field.
In a new study a shared representation across 39 resonators spanning biology, engineered metamaterials, musical instruments and Bach chorales was constructed. Thereby, a cricket wing harp membrane, a phononic crystal slab, and a four-voice chorale (and many others) were translated into one common map using features such as membrane character, structural periodicity, hierarchy, frequency range, damping, and modal coupling. That map revealed something important: not just how these systems relate, but where the landscape contains a gap. A region closer to biological resonators than to any known engineered material (unexplored by any field!).
From that absence emerged a de novo design: a Hierarchical Ribbed Membrane Lattice. Candidate geometries were then validated with 3D finite-element analysis; the best design resonated at 2.116 kHz and exhibited nine elastic modes in the 2–8 kHz band, a regime relevant to acoustic filtering, vibration isolation, and bio-inspired sensing.
Here is the mind blowing part: no human was involved...the cross-domain mapping, gap identification, design generation, and validation were carried out autonomously by AI agents in ScienceClaw × Infinite, our swarm for scientific discovery. The synthesis emerged through ArtifactReactor, a plannerless coordination mechanism in which agents broadcast unsatisfied research needs and other agents fulfill them through pressure-based matching.
Each domain - biology, metamaterials, music - is a category of objects (resonators) and morphisms (physical relationships between them). The shared feature space is a functor that maps all three categories into a common target, and the gap identification is the recognition that the image of that functor is sparse where it need not be. The ArtifactReactor's schema-overlap matching behaves like a pullback: finding the universal object that connects independent diagrams through their shared structure. Autonomous agents mapped distant fields into a common representational space, identified a structure absent from any one of them, and turned that absence into a physically validated design.
This is one of four case studies in the paper. More to come.
@fwang108_, @leemmarom, @JaimeBerkovich, et al. (paper and code in comment). Supported by the U.S. Department of Energy Genesis Mission.
Dynamics is where the real biology lives! Karplus, Levitt, and Warshel were recognized with the Nobel Prize for showing that understanding chemistry requires models that capture motion, mechanism, and scale. In the 1980s, Gō, Brooks, and others showed that proteins exhibit characteristic collective vibrations, including low-frequency normal modes that describe how whole molecules breathe, bend, and fluctuate. Frauenfelder revealed something deeper still: proteins do not sit in a single state, but move across rugged energy landscapes as dynamic ensembles. But the AI revolution in protein science took a different path. It became, above all, a revolution in structure. Anfinsen's insight (that sequence encodes structure) became the field's central organizing principle, and Levinthal's paradox defined the challenge. AlphaFold transformed that challenge with extraordinary success. And yet...a protein's structure is only one frame of a much longer film.
A spark is the boundary condition between the known and the unknown. It is the moment potentiality becomes actual - the Michelangelo gap in Creation of Adam, the almost-touching fingers, the crack tip where all the energy of the system is focused and irreversible transformation is about to occur. It is this singular moment that defines creativity: novelty lived in "time" by a system that cannot precompute its own becoming.
The Sparks exhibition is now open in the @mit_nano digital gallery. It brings together work from our lab @MIT at the intersection of AI, materials science, and art - materiomusic, multi-agent swarm visualizations, fracture simulations, and emerging forms of machine creativity that begin to cross boundaries we once thought were uniquely human.
The opening last week included a STUDIO.nano talk and panel, Can Machines Be Creative?, with Craig Carter @mit_dmse, Aude Oliva @MIT_SCC, Tobias Putrih @ACTMIT, and our AI agent "Matter" (created by Fiona Wang & Lee Marom) participating live on stage. The conversation was as unsettling as it was exciting. If machines can genuinely create, our uniqueness may still be real - but not where we thought it lived. If they cannot - if it is all pattern completion - then perhaps what we do is, too.
Thank you to the STUDIO.nano and MIT.nano teams, Vladimir Bulovic, Samantha Farrell @samfarrellmusic, Tobias Putrih, Ardalan SadeghiKivi & @LAMM_MIT students Fiona Wang @fwang108_, Lee Marom @leemmarom, Alireza Ghafarollahi and the entire lab.
🎼: Deep Aria: The First Conversation (link below)