a masterclass in coding agents from the head of anthropic.
there’s still a tonne of leverage in knowing how to use these systems optimally and this is the best i’ve seen.
make sure to bookmark so you can watch again and again chat
there are many types of intelligence. which type do we want models to have?
yesterday i was analyzing agent trajectories in our rl envs and digging into tool-calling performance (how models use functions):
> gpt-5 made 10x fewer errors than every other model!
> claude made 10x more errors – but it reflected on its errors and fixed them. and so even though its initial tool calls were much worse, its final performance was close to gpt-5’s.
this shouldn't be possible under the standard paradigm. in rl, we're taught that only outcomes matter: the reward signal, the final state, the destination.
but that’s why i love digging into individual trajectories to understand what’s going on.
gpt-5 embodies precision intelligence: flawless execution, doesn't make the mistake in the first place.
claude embodies adaptive intelligence. it makes errors… but possesses something possibly rarer? the wisdom to notice and correct them?
it’s like that friend who shows up perfectly dressed, says exactly the right thing, never spills their drink vs. the one who trips walking in, knocks over a plant, makes a joke, and everyone laughs.
both are intelligent. which is better?
i don't know. but claude's error-recovery patterns seem closer to metacognition. it's monitoring its execution and thinking about its thinking.
gpt-5 may not need this layer right now. its first-order thinking is so accurate that it doesn’t need to reflect. maybe that's fine enough right now, when problems are straightforward enough to one-shot.
but what about when they're not?
(note: i don't know if gpt-5 is equally good at error-recovery when it needs to be. maybe it is! but i've noticed claude's recovery capabilities in the past.)
in an increasingly complex world, where problems get harder and harder, i wonder if resilience will matter more than perfection?
do you want the model that never falls, or the model that knows how to stand back up?
Wow!
Grok analyzed the 1890 Thomas Edison lightbulb patent. Determined a better filament design and lit up the light.
This emergent intelligence is found in no other AI model.
It is fascinating and portends to the ability to not only change education but allow robots to build!
"Claude Code is not just for coding. It's an AI agent that can do whatever you want it to do."
Here's my new tutorial with @alexfinn where he showed me exactly how to build a life OS with Claude Code in 25 minutes.
He saves 2+ hours daily using:
✅ /researcher to analyze competitors
✅ /daily_brief to get AI news every morning
✅ /brain_dump to find patterns from notes
His setup uses slash commands and sub-agents and honestly blew my mind 🔥
Watch the full tutorial to learn how to set up a life operating system for yourself.
📌 Watch now: https://t.co/L3EBX8MFLk
Also available on:
Spotify:
https://t.co/ZeSadKKns6
Newsletter:
https://t.co/mNAsR6kHGd
Since DeepSeek R1's release, very quickly AWS, Azure, Fireworks AI, Groq, Hugging Face, SambaNova and Together AI all started to host R1 variants. What's the "best" model changes frequently, and so developers often want to try out new ones. The aisuite package, which helps developers do this quickly with minimal code changes.
Thanks Rohit Prsad & team for working with me on this!
https://t.co/gwz9oKTCFx
For many, DeepSeek's rise was unexpected. But what can we learn from prior internet waves about what might happen next?
@martin_casado and @stevesi joined the a16z Podcast to discuss what drove the DeepSeek frenzy and more importantly, what we should take away, through the lens of Internet history.
Full episode here: https://t.co/6QW5l96pgF
Understanding new .cursor/rules in 0.45
I've seen so many people struggling with .cursorrules and with the new .cursor/rules directory.
Here is a Guide for you! 🧵
Announcing new open-source Python package: aisuite!
This makes it easy for developers to use large language models from multiple providers. When building applications I found it a hassle to integrate with multiple providers. Aisuite lets you pick a "provider:model" just by changing one string, like openai:gpt-4o, anthropic:claude-3-5-sonnet-20241022, ollama:llama3.1:8b, etc.
pip install aisuite
Open-source code with instructions: https://t.co/gwz9oKTCFx
Thanks to Rohit Prsad, Kevin Solorio, @standsleeping, Jeff Tang and @Johnsanterre for helping build this!
At Chroma - we are hiring our first product engineer.
You will work directly with me and @MadelineNotes.
You must have a great eye for functional, high-performance, and beautiful products.
In-person in SF, DM me to learn more
@subnetmarco@martin_casado@ayyar Its confusing - how to interpret market share of Bedrock, Azure, others - they offer Llama, Claude, Mistral, OpenAI.
Downloaded the pdf report from kong but it mentions nothing about how its computed.
Super useful if it can be broken down per model & per provider.
1/ When starting a company, can you get to *5 customers*? Who are they? Why will they trust YOU?
As a VC, these are questions I always try to ask (selling B2B). I’ll tell my story of how my company got our first 5, and why 5 seems like a good heuristic of “you’ve got something”
I've now been asked multiple times for my take on Elon's offer for Twitter.
So fine, this is what I think about that. I will assume the takeover succeeds, and he takes Twitter private. (I have little knowledge/insight into how actual takeover battles work or play out)
(long 🧵)
1/6) India is on fire. 7 people will die/min unless we set up hospital beds+oxygen. Sending concentrators is not sufficient. Hospital beds need that and more! So, #PoojaMalik, #NeerajGarg and I decided to set up https://t.co/mx8SzcPxRl. I will donate $20/re-tweet upto $100K