A useful agent should not just remember facts. It should learn from what happened before.
I’ve wrapped up my 3-part series on agent memory, from why context is not enough, to modern memory architectures, to the frontier of reflection, consolidation, and memory-guided action.
Part 1: https://t.co/CxtFmAoEn5
Part 2: https://t.co/nauSzWmomg
Part 3: https://t.co/dG0AkKOooP
@eladgil BS.
Attention was born in Montréal
PyTorch in NYC.
AlphaGo in London
AlphaFold in London
ESMFold in NYC
Llama 1 in Paris.
Llama 2 in Paris+NYC+SV
DeepSeek in Hangzhou
Plus:
DINO in Paris
JEPA in Montréal+Paris+NYC
SV is 3 mos ahead on topics SV is singularly obsessed with.
Effective today, we are:
1) Doubling Claude Code’s 5-hour rate limits for Pro, Max, and Team plans;
2) Removing the peak hours limit reduction on Claude Code for Pro and Max plans; and
3) Substantially raising our API rate limits for Opus models.
Wrote two posts on inference engineering.
Part 1 (capacity): one formula, the MoE sizing trap, why embedding fleets want a different shape.
Part 2 (throughput): decode is bandwidth-bound, not compute-bound. Two H200s often outserve four H100s on chat workloads, even when raw FLOPs look similar.
Part 1: https://t.co/aAkWLlKV35
Part 2: https://t.co/XUq3dfKNJ5
Two events, four weeks, same argument: Claude Code's source leaked on March 31, and Anthropic's quality postmortem landed yesterday. A 25-word system prompt line cost 3% on coding evals. Zero model weight changes.
The harness is the product.
https://t.co/3S7iDf1Jtu
Better models alone will not solve agent reliability.
If an agent cannot remember, verify, recover, or be inspected properly, the problem is often the harness, not the model.
Wrote up my thinking on harness engineering:
https://t.co/ET3qEt1XNh
I finally got through the 3 hr+ @maxsbennett interview on @MLStreetTalk (link at the end). It took me over two weeks to finish it. But it sharpened something I was already thinking while reading Packy McCormick’s Not Boring essay on world models, co-written with @PimDeWitte
I think we are still too loose with the phrase “world model”.
Current LLMs obviously have models. You do not get that level of performance without some internal structure that captures a surprising amount about language and the world. But Bennett’s distinction is more demanding than that. A world model, in the stronger sense, is about interventions and causality: I think this will happen if I do X, I do X, and then I update from the gap between what I expected and what actually happened.
That is not the same thing as learning from a fixed corpus.
What also stayed with me is his point about language. Language was not just a better way to communicate observations. It let humans share simulations, refine them together, and build on them across generations. That feels like a deeper explanation for why human knowledge compounds the way it does.
Read through that lens, world models start to look less like a side branch of robotics and more like a serious attempt to move beyond systems that are very good at describing the world but cannot really test themselves against it.
I still think this area is easy to overstate, and the term gets used too casually. But I do think the direction matters.
Maybe the next step after LLMs is not just better text generation, but systems that can form hypotheses, act, and revise.
MLST interview: https://t.co/dEbblABiFO
NotBoring article on World Models: https://t.co/uui35k2DwK
Introducing the new dev-browser cli.
The fastest way for an agent to use a browser is to let it write code.
Just `npm i -g dev-browser` and tell your agent to "use dev-browser"
Introducing the new dev-browser cli.
The fastest way for an agent to use a browser is to let it write code.
Just `npm i -g dev-browser` and tell your agent to "use dev-browser"
HN is asking the question everyone's avoiding:
When AI makes your devs 2x more productive, do you fire half of them or build twice as much?
The answers in this thread are more honest than any earnings call.
https://t.co/slQZYNqg6A
New paper: GPT-5.2 and Claude Opus 4.6 independently produce identical refusals for certain prompts.
"Deterministic silence" is correlated failure modes across competing labs.
Alignment monoculture may be a bigger risk than we thought.
https://t.co/5s1Y6Sdnb9
If you're trying to standardise how your team uses AI across the full software development lifecycle, this repo template is worth a look.
https://t.co/2rV97b71cF
Claude Code as a self-managing AI team; one repo, multiple specialised agents (PM, architect, engineer, reviewer) coordinating autonomously. The agentic SDLC is here.
https://t.co/esqlVrz8Vd
Today, we’re introducing Forge, a system for enterprises to build frontier-grade AI models grounded in their proprietary knowledge.
🌎 Forge bridges the gap between generic AI and enterprise-specific needs. Instead of relying on broad, public data, organizations can train models that understand their internal context embedded within systems, workflows, and policies, aligning AI with their unique operations.
We have already partnered with world-leading organizations, like ASML, DSO National Laboratories Singapore, Ericsson, European Space Agency, Home Team Science and Technology Agency (HTX) Singapore and Reply to train models on the proprietary data that powers their most complex systems and future-defining technologies.