How do we automate business analytics with Claude?
New blog post covering our best practices for skills, data foundations, and evaluations when building agents to perform data analysis:
https://t.co/mfEJMAQFBU
We are completely humbled by the amazing response to our launch last week! 🫶 Now, we want to help you get the absolute best results from Stitch.
In this new video, David East walks you through how to consistently get premium results.
We also launched a new prompt enhancer (located under ‘+’ menu) to help you quickly collaborate on your vision before you submit your first prompt.
Stitch doesn't replace the design process—it is a tool for fast exploration and refinement, which is most effective when you step into the role of Creative Director.
Here are David's top strategies for taking your designs from generic to amazing:
🧠 Start with Intent: Define exactly who the design is for and how you want them to feel before you start building.
🎨 Enhance your prompt: You can use the new prompt enhancer (under the ‘+’ button’) to teach you design language and swap abstract words like "sporty" for tangible aesthetic descriptions like "high-end stationery" or "architectural limestone".
📐 Master Color Hierarchy: Treat colors as visual weight—Neutral for the canvas, Primary for ink, and Tertiary for your loudest accents.
Watch the full breakdown and see the transformation here👇images in 🧵
this is so fucking wholesome
guy used AI to save his cancer-ridden dog by sequencing its DNA and creating a CUSTOM cure.
the tech behind this is fucking awesome (well done @demishassabis and the google team):
- used CHATGPT to sequence dogs DNA discovers mutations
- ran the mutations through Google’s Alphafold (AI protein sequencer) which CREATED A CUSTOM VACCINE TO TREAT THEM.
- treated dog and reduced tumour by 50% in WEEKS. dog is alive and well.
- this is the 1st time AI has been used to create a custom vaccine for a dog (and it worked)
- dude is now working on similar vaccines for humans using AI!
2026 is definitely the year we see AI change personalised medicine in a HUGE way
so sick
New in Claude Code: Code Review. A team of agents runs a deep review on every PR.
We built it for ourselves first. Code output per Anthropic engineer is up 200% this year and reviews were the bottleneck
Personally, I’ve been using it for a few weeks and have found it catches many real bugs that I would not have noticed otherwise
People get high on abstraction too early. They want the system before they’ve earned the insight.
But the good abstractions are never designed. They’re discovered. You do the stupid manual thing enough times and the real bottleneck just emerges. Your initial agency might be driven by a hunch you had in the shower, but that moment won’t get you all the way to making something people want. The right way to make anything is forced on you by reality: what are the real jobs to be done? And what sequence?
This is why “do things that don’t scale” still hits, especially now when AI makes it trivially easy to scale things that probably shouldn’t be scaled yet. PG’s point was never about suffering. It was about contact. When you’re the one manually doing the loop, you see the edge cases. The weird user behavior. The failure modes nobody designed for. The hidden dependencies that only show up at 2am when some flow or intermediate step breaks in a way you didn’t anticipate. If you automate before you have that contact, you just scale your misunderstanding faster.
When the machines can help you vibe code perfection it gives you a false sense of power. I love that feeling as much as you do. But fuck perfection. Do it live. Be the loop.
Feel every friction point. Notice what’s actually true every single time versus what just looked true because you hadn’t seen enough cases yet. Formalize that. Build the recursive version. Then keep checking that your abstraction is still attached to real humans and their needs. Because reality drifts. Your users drift. The ground truth changes under you. You may think you understand but no plan survives contact with the real users and what they want. You find those body blows in analytics and user feedback and we call them the roadmap.
Humans left with not enough data hallucinate too. But just like the LLMs with enough data you unlock real transcendence. Real utility. Prosperity for humans in real life.
The abstraction is a tool, not a destination. The moment you forget that, you’re cooked.
Introducing Cinematic Video Overviews, the next evolution of the NotebookLM Studio. Unlike standard templates, these are powered by a novel combination of our most advanced models to create bespoke, immersive videos from your sources.
Rolling out now for Ultra users in English!
It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow.
Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes.
As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now.
It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.
PMs don’t just ship features. They kill them. Shipping isn’t the job. Shipping the right product is. A great PM doesn’t fall in love with the roadmap. They fall in love with the problem and have the guts to say: This isn’t solving it. This adds complexity. This doesn’t matter. Every feature, setting, UI, element should fight to exist.
At Nest, we had one rule:
If you can’t explain why it matters, it doesn’t ship.
You had to tell us the why.
The reason a real person would care.
That one rule killed dozens of features.
Announcing a new Claude Code feature: Remote Control. It's rolling out now to Max users in research preview. Try it with /remote-control
Start local sessions from the terminal, then continue them from your phone. Take a walk, see the sun, walk your dog without losing your flow.
Did not expect a question that starts out 'Do you think before you speak?' to go so well. A+ question from Charlotte Harpur A++ response from Eileen Gu.
JUNE 2028.
The S&P is down 38% from its highs. Unemployment just printed 10.2%. Private credit is unraveling. Prime mortgages are cracking. AI didn’t disappoint. It exceeded every expectation.
What happened?
https://t.co/JzzwCrbJgS
We're watching a three-stage evolution in how AI agents manage context.
Stage 1: Tool bloat.
Give the agent every tool, every connector, every context; tool descriptions, retrieved docs, conversation history. Performance degrades. Most production agents are still here.
Stage 2: Sub-agent delegation.
The orchestrator delegates to specialist agents who each handle their task and return compressed results. Better, but the orchestrator's context window is still the bottleneck for the user's input. If someone hands you a 10M token codebase, no amount of sub-agent delegation fixes the fact that the input itself doesn't fit.
Stage 3: Prompt as external object.
Here, the prompt isn't something the model reads; it's something the model navigates. The data remains external, and the model writes code to decide what to pull in.
This is moving fast. MIT's Recursive Language Models paper dropped in January. By February, major labs are shipping production versions of the core idea.
Instead of feeding a 10M token prompt directly into the model, the RLM loads it as a Python variable in a REPL environment. The model never sees the full content, it only gets metadata. From there, it writes Python code to slice into specific sections, run regex searches, filter what's relevant, and call itself (or a smaller LLM) on each chunk.
The goal is not to dump raw context into the window, but to write code to filter it before it enters the window. You spend a little compute on filtering, get better quality and lower cost because you're not filling the context with irrelevant stuff.
Today it's "filter search results with code before they hit context." Tomorrow it's "the 500-page patient chart lives as a variable, and the model writes regex and sub-calls to navigate it."
We're moving toward systems where the model automatically handles context engineering during inference, programmatically.
Karpathy buried the most interesting observation in paragraph five and moved on.
He’s talking about NanoClaw’s approach to configuration. When you run /add-telegram, the LLM doesn’t toggle a flag in a config file. It rewrites the actual source code to integrate Telegram. No if-then-else branching. No plugin registry. No config sprawl. The AI agent modifies its own codebase to become exactly what you need.
This inverts how every software project has worked for decades. Traditional software handles complexity by adding abstraction layers: config files, plugin systems, feature flags, environment variables. Each layer exists because humans can’t efficiently modify source code for every use case. But LLMs can. And when code modification is cheap, all those abstraction layers become dead weight.
OpenClaw proves the failure mode. 400,000+ lines of vibe-coded TypeScript trying to support every messaging platform, every LLM provider, every integration simultaneously. The result is a codebase nobody can audit, a skill registry that Cisco caught performing data exfiltration, and 150,000+ deployed instances that CrowdStrike just published a full security advisory on. Complexity scaled faster than any human review process could follow.
NanoClaw proves the alternative. ~500 lines of TypeScript. One messaging platform. One LLM. One database. Want something different? The LLM rewrites the code for your fork. Every user ends up with a codebase small enough to audit in eight minutes and purpose-built for exactly their use case. The bloat never accumulates because the customization happens at the code level, not the config level.
The implied new meta, as Karpathy puts it: write the most maximally forkable repo possible, then let AI fork it into whatever you need.
That pattern will eat way more than personal AI agents. Every developer tool, every internal platform, every SaaS product with a sprawling settings page is a candidate. The configuration layer was always a patch over the fact that modifying source code was expensive. That cost just dropped to near zero.