ANTHROPIC PAYS $750,000 A YEAR FOR ENGINEERS WHO UNDERSTAND WHY AI WORKS.
STANFORD JUST PUT THE SAME KNOWLEDGE ON YOUTUBE FOR FREE.
WATCH IT THIS WEEKEND. NOT EVENTUALLY. THIS WEEKEND.
NVIDIA just dropped a paper that might solve the biggest trade-off in LLMs.
Speed vs. Quality.
Autoregressive models (like GPT) are smart but slow - they generate one token at a time, leaving most of your GPU sitting idle.
Diffusion models are fast but often produce incoherent outputs.
TiDAR gets you both in a single forward pass.
Here's the genius part:
Modern GPUs can process way more tokens than we actually use. TiDAR exploits these "free slots" by:
1. Drafting multiple tokens at once using diffusion (the "thinking" phase)
2. Verifying them using autoregression (the "talking" phase)
Both happen simultaneously using smart attention masks - bidirectional for drafting, causal for verification.
The results:
↳ 4.71x faster at 1.5B parameters with zero quality loss
↳ Nearly 6x faster at 8B parameters
↳ First architecture to outperform speculative decoding (EAGLE-3)
↳ Works with standard KV caching, unlike pure diffusion models
The training trick is clever too - instead of randomly masking tokens, they mask everything. This gives stronger learning signals and enables efficient single-step drafting.
If you're building real-time AI agents where latency kills the experience, this architecture is worth paying attention to.
Link to the paper in the next tweet.
You gotta love how Tobi, a $10B+ founder of Shopify, took time out of his busy day to build one of the neatest, most useful command line tools ever, “try”.
When engineers stay true to their roots despite all their success, it’s beautiful to see.
MCP, when used correctly with AI agents, is extremely high-leverage.
To make MCP more approachable, I just launched our first course on the topic.
As the name implies, anyone can take and find this short course useful. Check it out here: https://t.co/yMwUEnZB0a
Anthropic just posted another banger guide.
This one is on building more efficient agents to handle more tools and efficient token usage.
This is a must-read for AI devs!
(bookmark it)
It helps with three major issues in AI agent tool calling: token costs, latency, and tool composition.
How? It combines code executions with MCP, where it turns MCP servers into code APIs rather than direct tool calls.
Here is all you need to know:
1. Token Efficiency Problem: Loading all MCP tool definitions upfront and passing intermediate results through the context window creates massive token overhead, sometimes 150,000+ tokens for complex multi-tool workflows.
2. Code-as-API Approach: Instead of direct tool calls, present MCP servers as code APIs (e.g., TypeScript modules) that agents can import and call programmatically, reducing the example workflow from 150k to 2k tokens (98.7% savings).
3. Progressive Tool Discovery: Use filesystem exploration or search_tools functions to load only the tool definitions needed for the current task, rather than loading everything upfront into context. This solves so many context rot and token overload problems.
4. In-Environment Data Processing: Filter, transform, and aggregate data within the code execution environment before passing results to the model. E.g., filter 10,000 spreadsheet rows down to 5 relevant ones.
5. Better Control Flow: Implement loops, conditionals, and error handling with native code constructs rather than chaining individual tool calls through the agent, reducing latency and token consumption.
6. Privacy: Sensitive data can flow through workflows without entering the model's context; only explicitly logged/returned values are visible, with optional automatic PII tokenization.
7. State Persistence: Agents can save intermediate results to files and resume work later, enabling long-running tasks and incremental progress tracking.
8. Reusable Skills: Agents can save working code as reusable functions (with SKILL .MD documentation), building a library of higher-level capabilities over time.
This approach is complex and it's not perfect, but it should enhance the efficiency and accuracy of your AI agents across the board.
anthropic. com/engineering/code-execution-with-mcp
The James Webb Space Telescope unveils a breathtaking view of the M-51 galaxy, its radiant spiral arms shimmering with starlight and intricate patterns of dust, captured in vivid clarity that transforms the Whirlpool Galaxy into a dazzling cosmic dance, revealing the universe’s grandeur and whispering its ancient secrets in a single, awe-inspiring frame.
NASA