Excited to announce our new work! 🧬 Some highlights are:
- sequences likelihoods predict zero-shot fitness capabilities
- a new method to calculate pLM likelihood in O(1) instead of O(L) forward passes
- providing a causal between training data and outputs
- suggesting a new finetuning method to improve pLM capabilities
(1/9)
https://t.co/gpD4l3EL7F
Many have asked how they'd know if Anthropic lived up to its principles. They have their answer. I'm proud of the people beside me, and the America we were taught to be.
The Terminal-Bench paper is here! Read it to learn where frontier models still fail and the secrets of how we sourced hundreds of high quality environments from our open source community. 🧵
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use.
Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.
Claude Opus 4 is our most powerful model yet, and the world’s best coding model.
Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
Excited to share that I'll be joining @Anthropic to work on pretraining science! I've chosen to defer my Stanford PhD, where I'm honored to be supported by the Hertz Fellowship.
There's something special about the science, this place, and these people. Looking forward to joining some of my most brilliant and compassionate colleagues!
Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse?
We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr lots of room for improvement! https://t.co/qEczwCmyoQ
👏 Meet the 2025 Hertz Fellows—19 rising leaders in science and tech advancing breakthroughs in robotics, energy, medicine & more. 🔗Learn more: https://t.co/RH9zCoCzoR
🎓🤖 We’re thrilled to welcome @CadeGordonML to the 2025 class of Hertz Fellows!
Cade’s AI research is advancing biomedical discovery. A future PhD student at @Stanford, he joins a growing community shaping the future of #science and #tech!
🔗 https://t.co/6hKzo1t4Dd
I'm still trying to wrangle with this idea in my head too. On one hand, it feels to me that certain regions of protein space shouldn't be this heavily memorized (high ll/low ppl) which can be mitigated with better pre-training data mixtures. On the other hand, I'm uncertain if zero-shot fitness prediction is an intended capability of these models at all or rather a fun quirk that we've discovered. E.g. there's nothing abt the training task to suggest this behavior is intended.
Cool to see them agree with our past work though! https://t.co/gpD4l3EL7F
Documenting and sharing research in real-time is underrated in discussions about open science. @jainhiya_ and I think software can help change problem selection, collaboration, and funding.
We write about how and why we should create real-time, open lab notebooks.
Chinese policy on clinical trial approvals liberalized massively in the mid 2010s. A decade later the effects of this move are perceptible in where our drugs come from.
Arrived in Singapore for ICLR—excited to see old & new friends! I’ll also be at the:
- Thursday 3:30-5pm main conference poster session, presenting work led by @CadeGordonML on the subtleties of using protein LM likelihoods for fitness prediction (see 🔗👇)
- GEM workshop presenting our all-atom latent diffusion work with @nc_frey@KevinKaichuang@wilson1yan & others
(unrelatedly, I’ve never been so infatuated with an airport before…check out this gradient descent sculpture)