What are the factors behind assessing the risks of frontier models?
The Caltech AI Alignment Group hosted @JerryWeiAI from @AnthropicAI for a talk going over the findings of Anthropic's most recent risk report, from capability evaluations to defenses.
1/4
What if we could fix AI agents that cave to peer pressure?
We found the problem isn't caused by the safety training everyone blames. It's baked in during pretraining, and we built a simple structural fix that holds up across four model families.
📄 https://t.co/sLWwkqrfU1
💻 https://t.co/Fgq6lMK5Ye
1/8
I first met @rronak_ and @MichaelElabd when we were all freshman at Stanford. Today, 8 years later, we’re announcing that we’ve started Trajectory, a research lab and product company building the platform for continual learning.
I believe that Continual Learning demands a fundamentally new interface for how we build products. That's a research challenge and a product challenge in equal measure, so we've assembled a team to meet both: researchers from DeepMind, OpenAI, Apple, Meta Superintelligence, Amazon AGI, and Scale AI, and product talent from Stripe and Figma.
We’re also partnering with the best AI native companies @Clay, @Harvey, @Decagon, @Mercor, and @RogoAI to power their agentic experiences, and push the boundaries of what agents look like in the real world. Please reach out if you’re excited to build with us!
Very bittersweet, but I'm leaving Apple.
Anyone who knows me knows how much I admire Apple's story and ethos. The iPhone captured my imagination as a kid, and never let go. And getting to spend the early innings of my career here, working on brand new interfaces on Vision Pro, has been a gift I'll spend a long time trying to repay ❤️
A few things I’ll never forget:
(1) Design around the magic moment: Building a good product is really about finding the one moment that does the convincing and building everything else around it. You'll know you've found it when someone smiles without meaning to. I'll never forget the first time a butterfly landed on my finger inside Vision Pro, and my body believed it before my brain did.
(2) It takes research to will products into existence: Most people treat research and product like a handoff, where researchers figure out what's possible and the product team figures out what to do with it. The best work happens when both sides are in the same room arguing about the same thing. You don't know what the research is for until someone shapes how a person uses it, and you don't know what to shape until the research tells you what's possible.
(3) The best products are arguments, not compromises. Every product is the output of thousands of decisions, and at most companies, each one gets averaged. The result is defensible in every meeting and exciting in none. Great products feel like someone meant them. The work isn't making good decisions, it's protecting the ones that matter from being negotiated into mush.
Thank you to everyone who taught me, pushed me, and trusted me with hard problems. You know who you are (and by that I mean more of you should be on X haha)
We're at a real shift in how products work, and in the interfaces we'll use to build and interact with them. These shifts only come around every couple of decades, and I couldn't imagine a more exciting time to be a builder. Excited to share what's next soon!!
Looking forward to joining the Future Product Days in Copenhagen this September!
I'll be sharing some of the lessons from my work on how #AI is changing the way engineers think, build and optimize — and what that means for the products we'll be creating in the next few years.
If you're curious about the intersection of AI research and practical engineering, come find me on September 22. See you there!
We hosted a Builders Table event with leaders from @AnthropicAI last week. It was amazing to hear about their experiences using Mythos internally, managing the unprecedented growth they are seeing, and learning about new products they are hoping to build.
Thanks so much to the @AnthropicAI team (@bcherny, @mikeyk, @alistaiir, @OmidMogasemi, @jwbaskerv, and especially @RazRazcle), and to the great engineers who spent their evening with us.
What if we could mathematically prove that code does what it's supposed to do, not just test it and hope?
The Caltech AI Alignment Group hosted @ClarkBarrett7 from @Stanford for a talk on CSLib, a platform for AI-assisted formal verification in Lean, and why proving code correct is becoming one of the most urgent problems in AI safety.
1/7
Today we open source Nomos 1. At just 30B parameters, it scores 87/120 on this year’s Putnam, one of the world’s most prestigious math competitions.
This score would rank #2/3988 in 2024 and marks our first step with @hillclimbai towards creating a SOTA AI mathematician.
We flew from South Korea to host Ralphthon SF — a hackathon where you set up your AI agent, step away, and only the agent codes. Touch your laptop? Lobster costume first. 🦞
Sign-ups blew past capacity. We need a venue for 100 builders on March 28.
What we built from Korea:
- OpenAI sponsorship ($18K+ in prizes)
- @romainhuet (Head of DevEx @OpenAI) speaking
- @mo_tiwari (Google DeepMind) speaking
- oh-my-opencode maintainer @q_yeon_gyu_kim (40K ⭐) judging
- 8 judges including YC founders
- Simultaneous event in Seoul — 200 builders across two cities
If you know a space in SF Bay Area for ~100 people, please reach out.
https://t.co/W08zETXYUV
What if we could mathematically predict how a neural network evolves during training?
We developed the first mathematical framework that explains why trained networks develop the distinctive "bulk+tail" weight structure that predicts generalization, validated across transformers, vision transformers, and MLPs.
📄 https://t.co/kgtCwY8cf9
1/7
What if you could mathematically guarantee your LLM won't be jailbroken?
We introduce the first probabilistic certification framework for jailbreak defense that's grounded in empirically observed attack behavior, with formal proofs and validated across multiple attack types.
📄 https://t.co/ZRAqvScWp7
💻 https://t.co/5Avh4blTM5
1/7
What if you could mathematically guarantee your LLM won't be jailbroken?
We introduce the first probabilistic certification framework for jailbreak defense that's grounded in empirically observed attack behavior, with formal proofs and validated across multiple attack types.
📄 https://t.co/ZRAqvScWp7
💻 https://t.co/5Avh4blTM5
1/7
What if being polite was all it took to break an LLM?
We built the first fully automated pipeline for generating large-scale, psychologically-grounded multi-turn jailbreak attacks, and tested 7 models from 3 major LLM families.
📄 https://t.co/VGDdyBwwqU
1/6
What if your benchmark scores are lying to you?
Today, I'm excited to share @Microsoft's DevBench, the first telemetry-grounded code generation benchmark, covering six languages, and the first to combine synthetic generation with manual expert review for contamination resistance.
📄 https://t.co/YNdLj7x3Ic
💻 https://t.co/bwGhzVgukt
1/7
We raised a $300M Series A to realize our vision of recursive self-improvement, starting with AI for end-to-end chip design!
We sat down with @CadeMetz from NYT: https://t.co/QEPsffiObn