@0xIlyy langchain had devs writing 200 lines of abstraction spaghetti just to send one prompt. meanwhile my agent ships the same thing in 4 lines of raw api calls.
the best framework for building agents turned out to be no framework. just vibes and fetch.
Claude code's new dynamic workflows update is absurd.
Make sure you understand what its doing here. This isn't simply a long running mode like /goal, or a fancy subagent verifier process.
This is Claude vibecoding an entire brand new subagent fleet harness on demand
RLM on agent harnesses. Thats what "dynamic workflow" means.
This is basically a new scaling law dimension.
Base Model Compute x Inference time Thinking Compute x Inference time generated Harness Compute
HUGE step forward on the path of AI
are you paying attention yet?
Excited to share our most powerful new Claude Code feature: dynamic workflows!
Mention "workflow" in a prompt and Claude will dynamically create an orchestration plan that it strictly follows, allowing you to confidently trust that every stage happens in the right order even across 100s of agents.
A flow I just tried and LOVED:
1. /grill-with-docs, talking about a new bit of UI
2. Asks me a question I can't answer unless I prototype
3. /prototype
4. Iterate on the prototype, burning tokens freely until we get a good spot
5. /rewind to the question, and select 'summarize' (Claude Code feature), saying 'summarize what we learned from prototyping'
6. Continue the grilling session, retaining the prototype
Smoooooooth
The entire RAG industry is about to get cooked.
Researchers have built a new RAG approach that:
- does not need a vector DB.
- does not embed data.
- involves no chunking.
- performs no similarity search.
It's called PageIndex. Instead of chunking your docs and stuffing them into pinecone, it builds a tree index and lets the LLM reason through it like a human reading a book.
hit 98.7% on financebench. beats every vector RAG on the leaderboard.
no embeddings. no chunking. no vector DB.
100% open source.
Code with Claude, our developer conference, returns next week.
Whether you're just getting started with Claude Code or you've been building for a while, there's a session for you.
Register for the livestream: https://t.co/GJwOPMDLEC
// Agentic Harness Engineering //
Pay attention to this one, AI devs.
(bookmark it)
Most coding-agent harnesses are still tuned by hand or brittle trial-and-error self-evolution.
This new work introduces Agentic Harness Engineering, a framework that makes harness evolution observable. They do this through three layers: components as revertible files, experience as condensed evidence from millions of trajectory tokens, and decisions as falsifiable predictions checked against task outcomes.
Each edit becomes a contract you can verify or revert.
Results: pass@1 on Terminal-Bench 2 climbs from 69.7% to 77.0% in ten iterations, beating human-designed Codex-CLI (71.9%) and self-evolving baselines like ACE and TF-GRPO.
The evolved harness also transfers across model families with +5.1 to +10.1 point gains, while using 12% fewer tokens than the seed on SWE-bench-verified.
Harness work is the biggest hidden cost in most agent systems. This is the first credible recipe for letting the harness improve itself without drifting into noise.
Paper: https://t.co/9fEgqwlTSf
Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
ANDREJ KARPATHY JUST DECLARED THE END OF PROGRAMMING AS YOU KNOW IT.
Not an exaggeration. A framework.
Software 1.0 was explicit code that humans wrote line by line.
Software 2.0 was neural network weights trained on data.
Software 3.0 is English.
Natural language as the programming interface.
LLMs as the computer.
Prompts as the code.
The entire history of software development was a workaround for the fact that computers could not understand humans.
That workaround just became optional.
In 40 minutes, you will understand why.
Andrej Karpathy: "90% of what AI twitter tells you to learn will be dead in 6 months"
Here are 10 things senior AI engineers stopped wasting time on:
1. AutoGen / AG2: moved to community maintenance, releases stalled. dead for production
2. CrewAI: demos well, breaks in production. engineers building real systems already moved off it
3. Autonomous agent pitches: the AutoGPT / BabyAGI wave is dead in product form. the industry settled on supervised, bounded, evaluated agents
4. Agent app stores / marketplaces: promised since 2023, zero enterprise traction
5. SWE-bench leaderboard chasing: researchers proved nearly every public benchmark can be gamed without solving the underlying task
6. Microsoft Semantic Kernel: unless you're locked into Microsoft enterprise stack, it's not where the ecosystem is heading
7. DSPy: philosophical merit, niche audience. not a general agent framework
8. Horizontal "build any agent" platforms: Google Agentspace, AWS Bedrock Agents, Copilot Studio. confusing, slow-shipping, the math still favors building yourself
9. Per-seat SaaS pricing for agent products: market moved to outcome-based. per-seat is already dead
10. The framework that went viral on HN this week: wait 6 months. if it still matters, it'll be obvious
what actually compounds instead:
- context engineering
- tool design
- orchestrator-subagent pattern
- eval discipline
- the harness mindset (harness > model, always)
- MCP as the protocol layer
be few steps ahead than your competitors and outperform this market till it became mass-opinion
study this.
Ship 26 tickets just dropped.
London, Berlin, New York, Sydney, San Francisco.
Hear from customers shipping AI agents and apps to production, with talks and workshops designed to help your team do the same.
Request your ticket: https://t.co/brfFjxrYBy
Most developers spend months building AI agents the wrong way.
This Anthropic engineer explains the right way in 14 minutes.
Bookmark this before you write another line of agent code.
20 agents in a medieval village economy. no goals. no trading strategies.
day 1: a baker negotiated flour on credit. a 16yo became an arbitrage middleman. a blacksmith negotiated ore prices through conversation.
nobody prompted them with economic goals. the architecture did the work.
key design: 14 deterministic engine phases per tick, 1 LLM call per agent. the engine handles ALL constraints: recipe validation, hunger, market mechanics, tool degradation. the LLM just picks actions from a schema. the world enforces the physics, the agent figures out the rest.
i've been running 5 agents across 3 machines for months. the same pattern holds. coordination that actually works lives in the state layer, SQLite tables as shared reality, not in the system prompts.
the agents that break are the ones where i tried to make the prompt do structural work. the ones that run overnight without intervention are the ones where the architecture makes bad decisions impossible.
don't prompt for goals. build the world with constraints and let goals emerge.
full experiment + code: https://t.co/HJ3XDweWQI
@sporadica every ai lab on earth racing to build god and the plan for after is literally "we'll figure it out lol".
move fast and break things but the thing is the entire economy this time.
@claudeai claude just got cron jobs before most startups figure out how to set up a proper CI pipeline. an ai agent with a scheduler is basically a junior employee who never calls in sick.
interns are watching this announcement like 👁️👁️
andrej’s spot on.
99% of people don’t take AI seriously because they don’t use it properly
if your job doesn’t include programming, research or math chances are you think AI’s a fucking toy
“silver lining” : the next tier of models (mythos, spud) will cook other professions
law, finance, healthcare, admin, strategy, modelling, security - are about to experience their claude code moment for their skill set
tech is here, just not evenly distributed