cuda frameworks that JIT and autotune are so painful. i don't want to have magic happen during the start of training. i want to precompile my kernels, test them once, and be happy.
What will the role of AI compilers be in the age of AI agents and frontier kernel programming? We believe agents should have access to a predictable DSL that offers maximum expressiveness, paired with a minimal compiler they can directly open up, build toolings, and improve for specialized optimizations. TIRx is our effort on this front. We've had a great experience using it in our latest mega-kernel compiler research and teaching Blackwell programming in our ML systems course at CMU. Check it out:
NEW: Inside Cursor's wild rise. Lots of great new details:
• CEO Michael Truell didn't pay himself for years
• Cursor once made up 40-50% of Anthropic's revenue
• Anthropic told Cursor that Claude Code was just a 'research effort' (lol)
• Cursor's unpaid "work trials"
The most comprehensive RL overview I've ever seen.
Kevin Murphy from Google DeepMind, who has over 128k citations, wrote this.
What makes this different from other RL resources:
→ It bridges classical RL with the modern LLM era:
There's an entire chapter dedicated to "LLMs and RL" covering:
- RLHF, RLAIF, and reward modeling
- PPO, GRPO, DPO, RLOO, REINFORCE++
- Training reasoning models
- Multi-turn RL for agents
- Test-time compute scaling
→ The fundamentals are crystal clear
Every major algorithm, like value-based methods, policy gradients, and actor-critic are explained with mathematical rigor.
→ Model-based RL and world models get proper coverage
Covers Dreamer, MuZero, MCTS, and beyond, which is exactly where the field is heading.
→ Multi-agent RL section
Game theory, Nash equilibrium, and MARL for LLM agents.
I have shared the arXiv paper in the replies!
Claude Code fully dissected!
Researchers from UCL reverse-engineered the leaked Claude source. What they found changes how you should think about agent design.
Only 1.6% of the codebase is AI decision logic.
The other 98.4% is operational infrastructure. Permission gates, tool routing, context compaction, recovery logic, session persistence. The model reasons. The harness does everything else.
This is the opposite of what most agent frameworks do today.
LangGraph routes model outputs through explicit state machines. Devin bolts heavy planners onto operational scaffolding. Claude Code gives the model maximum decision latitude inside a rich deterministic harness, and invests all its engineering effort in that harness.
The core loop is a simple while-true. Call model, run tools, repeat.
But the systems around that loop are where the real design lives:
A permission system with 7 modes and an ML classifier. Users approve 93% of prompts anyway, so the architecture compensates with automated layers instead of adding more warnings.
A 5-layer context compaction pipeline. Each layer runs only when cheaper ones fail. Budget reduction, snip, microcompact, context collapse, auto-compact.
Four extension mechanisms ordered by context cost. Hooks (zero), skills (low), plugins (medium), MCP (high). Each answers a different integration problem.
Subagents return only summary text to the parent. Their full transcripts live in sidechain files. Agent teams still cost roughly 7x the tokens of a standard session.
Resume does not restore session-scoped permissions. Trust is re-established every session. That friction is the point.
The bet behind all of this is simple. As frontier models converge on raw coding ability, the quality of the harness becomes the differentiator, not the model.
Paper: Dive into Claude Code (arXiv:2604.14228)
In the next tweet, I've shared an article I wrote on Agent Harness and what every big company is building. Do check.
If you're an AI/agent builder, it's so important that you don't overbuild and overcommit on a specific toolset and infrastructure.
Frontier labs are shipping not just the models, but the harnesses and surrounding tooling such that your existing stack might be obsolete next week.
* e.g. if you had a super complex RAG stack, you may need to rip it out in favor of agents + sandboxes
* e.g. if you spent a lot of time building the sandbox and serving layer, you may not need to anymore if you can just bootstrap the product with Claude Managed Agents
The tradeoff is completely dependent on how good out-of-the-box these proprietary agent wrappers get. Back when the OpenAI Agent SDK came out, most people did not switch from frameworks because they were simply more powerful. Nowadays tools like the Claude Agent SDK + managed agent services are getting way better.
New on the Engineering Blog:
Building Managed Agents—our hosted service for long-running agents—meant solving an old problem in computing: how to design a system for “programs as yet unthought of.”
Read more: https://t.co/YYaEub2QGV
Ultrafast Trading Systems in C++ by David Gross
"While low-latency programming is sometimes seen under the umbrella of 'code optimization', the truth is that most of the work needed to achieve such latency is done upfront, at the design phase."
https://t.co/FYv8Iml9aM
The Codex app server was such a brilliant stroke of foresight that really doesn't get enough love
Not only are you allowed to use your chatgpt account with any harness, but you can build your own apps directly on top of theirs.
They just make building on and with codex such a great experience
To demonstrate this utility, I want to highlight the kitty litter app, made by @SIGKITTEN.
Instead of having to build the entire harness, and all the infrastructure, he's plugged into the app server for a unified experience between mobile and dev machine.
When I create a session on my computer, it's automatically available on my phone. All of the chats you see in this video automatically populated when we connected to the app server.
All my skills. My agents. My sessions. My folders. My prompts. They're all ready to use - automatically.
Because they're exposed by the app server, along with many other endpoints.
It's a great ux/dx that really deserves some love.
It's almost like they want you to build on top of their products ;)
Btw Litter is great 👍