8 RAG architectures for AI Engineers:
(explained with usage)
1) Naive RAG
- Retrieves documents purely based on vector similarity between the query embedding and stored embeddings.
- Works best for simple, fact-based queries where direct semantic matching suffices.
2) Multimodal RAG
- Handles multiple data types (text, images, audio, etc.) by embedding and retrieving across modalities.
- Ideal for cross-modal retrieval tasks like answering a text query with both text and image context.
3) HyDE (Hypothetical Document Embeddings)
- Queries are not semantically similar to documents.
- This technique generates a hypothetical answer document from the query before retrieval.
- Uses this generated document’s embedding to find more relevant real documents.
4) Corrective RAG
- Validates retrieved results by comparing them against trusted sources (e.g., web search).
- Ensures up-to-date and accurate information, filtering or correcting retrieved content before passing to the LLM.
5) Graph RAG
- Converts retrieved content into a knowledge graph to capture relationships and entities.
- Enhances reasoning by providing structured context alongside raw text to the LLM.
6) Hybrid RAG
- Combines dense vector retrieval with graph-based retrieval in a single pipeline.
- Useful when the task requires both unstructured text and structured relational data for richer answers.
7) Adaptive RAG
- Dynamically decides if a query requires a simple direct retrieval or a multi-step reasoning chain.
- Breaks complex queries into smaller sub-queries for better coverage and accuracy.
8) Agentic RAG
- Uses AI agents with planning, reasoning (ReAct, CoT), and memory to orchestrate retrieval from multiple sources.
- Best suited for complex workflows that require tool use, external APIs, or combining multiple RAG techniques.
Most architectures here involve some form of retrieval-time decision. But they all run on top of whatever was already indexed.
If that indexing step outputs messy chunks, every architecture inherits them. Improving it is a separate problem from the 8 above.
My co-founder wrote about a better unit for the indexing step. The technique:
- cuts corpus size by 40x.
- reduces tokens per query by 3x.
- improves vector search relevance by 2.3x.
And it doesn't alter the retrieval algorithm, the reranker, or the embedding model.
Read it below.
Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model."
41% mistake rate without a CLAUDE.md. 11% with the 4-rule baseline. 3% with the 12-rule version below
here are the 12 rules senior engineers settled on:
1. think before coding: state assumptions, don't guess. the model can't read your mind, stop hoping it will
2. simplicity first: minimum code, no speculative abstractions. the moment you let Claude add "for future flexibility," you've added 200 lines you'll delete next quarter
3. surgical changes: touch only what you must. don't let it improve adjacent code, that's how PRs blow up
4. goal-driven execution: define success criteria upfront, loop until verified. without them Claude either loops forever or stops too early
5. use the model only for judgment calls: classification, drafting, summarization, extraction. NOT routing, retries, status-code handling, deterministic transforms. if code can answer, code answers
6. token budgets are not advisory: per-task 4000, per-session 30000. by message 40 of a long debug, Claude is re-suggesting fixes you rejected at message 5
7. surface conflicts, don't average them: two patterns in the codebase? pick one. Claude blending them is how errors get swallowed twice
8. read before you write: read exports, callers, shared utilities. Claude will happily add a duplicate function next to an identical one it never read
9. tests verify intent, not just behavior: a test that can't fail when business logic changes is wrong. all 12 of Claude's tests can pass while the function returns a constant
10. checkpoint every significant step: Claude finished steps 5 and 6 on top of a broken state from step 4. nobody noticed for an hour
11. match the codebase conventions: class components? don't fork to hooks silently. testing patterns assumed componentDidMount, hooks broke them without surfacing
12. fail loud: "completed successfully" with 14% of records silently skipped is the worst class of bug. surface uncertainty, don't hide it
what actually compounds instead of the next framework:
- the CLAUDE.md file as institutional memory across sessions
- eval-driven changes, not vibe-driven
- checkpoints over speed
- explicit conflicts over silent blending
- discipline over framework, every time
- one repo, one rules file, no exceptions
you don't need a better AI
you need better context engineering
complete playbook below ↓
Sub-Agents vs Agent Teams in Claude Code:
Sub-agents get their own system prompt, their own tool set, and a clean context window. They report back to the parent and terminate.
Agent teams get all of that plus three things sub-agents don't have:
- a shared task list with dependency tracking
- peer-to-peer messaging between teammates
- persistent context that accumulates over time.
We published an article that dives into a lot more detail.
Read it below.
Anthropic pays $750,000+ a year for engineers who can build LLMs from scratch.
Not how to prompt them.
Not how to fine-tune them.
Not how to build RAG pipelines.
But how to build them from scratch.
This 2-hour Stanford lecture teaches you everything.
Scaling laws.
Data collection.
Architecture design.
Post-training alignment.
Free. From Stanford.
Watch first. Then read this.
The lecture is the theory.
And this article shows you how to actually build it (with code) ↓
KARPATHY JUST HANDED EVERY DEVELOPER THE EXACT FILE CLAUDE CODE NEEDED FROM DAY ONE.
65 lines. 110K stars. the cheat code for every broken workflow you've been blaming on the model.
if I had this a year ago, I would've shipped twice as fast.
make sure to bookmark it before it gets lost in your feed.
I was losing 2 hours a day to Claude rewriting code I didn't ask it to touch.
then I found CLAUDE. md.
90 seconds to set up. changed everything.
Karpathy identified 4 failure patterns Claude Code repeats constantly, in his own words:
→ silent assumptions: Claude makes decisions without checking with you
→ code bloat: 1000 lines written when 100 would do
→ collateral damage: Claude edits code unrelated to the task
→ no success criteria: Claude loops with no finish line
these aren't model failures. they're missing instructions.
CLAUDE. md gives Claude the 4 rules it needed from day one:
→ think before coding, state assumptions. ask before assuming.
→ simplicity first, minimum code. nothing speculative.
→ surgical changes, touch only what is required. nothing adjacent.
→ goal-driven execution, define success before starting. loop until verified.
65 lines. no build step. no framework. no dependencies.
just the 4 principles every developer already knew, but needed Karpathy to write down.
(Link to the REPO in the comment below)
the guide on how to build a second brain with CLAUDE is in the article below.
Today I'm publishing a new essay, Policy on the AI Exponential. AI is progressing extremely fast—much faster than the policy process was built to handle. The essay lays out where I think the technology is now, and the action needed to close the gap: https://t.co/Lh6PWae178
Use Ollama with Hermes Desktop by @NousResearch.
Hermes Desktop brings the same agent (its multi-agent engine, self-improving skills, and messaging integrations) into a desktop app on macOS, Windows, and Linux.
Run it on Ollama using local or cloud with one command:
ollama launch hermes-desktop
🧵
𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 is NOT a README.
Most devs:
→ Add a few bullets
→ Maybe a build command
→ Call it “done”
Then complain:
“Claude writes bad code” 🤦♂️
No.
Your CLAUDE.md is just… useless.
Here’s how to fix it 👇
1️⃣ Use ALL 3 scopes (not just one)
• Global → ~/.claude/CLAUDE.md
• Project → ./CLAUDE.md
• Folder → ./src/CLAUDE.md
Merge order:
Global → Project → Folder (last wins)
Most people miss this.
2️⃣ Follow WHAT / WHY / HOW
• WHAT → stack, structure, dependencies
• WHY → decisions, patterns, anti-patterns
• HOW → commands, tests, deploy flow
Skip one = Claude guesses.
And it guesses wrong.
3️⃣ Be SPECIFIC
❌ “Write clean code”
✅ “camelCase vars, PascalCase components”
❌ “Test everything”
✅ “80% coverage, npm test --watch”
Vague = ignored
Specific = followed
4️⃣ Follow these 5 rules
• Run /init first
• Keep it < 500 lines
• Expect ~70% compliance
• Update monthly
• Reference configs (don’t copy)
The truth?
Top engineers aren’t better at prompting.
They’re better at designing CLAUDE.md.
Fix this → your AI code quality 10x 🚀
Six patterns for building dynamic workflows and loops identified by Anthropic:
1. Classify-and-act: one agent decides the type, the script routes it. Example: bug vs feature vs noise.
2. Fan-out-and-synthesize: one agent per piece, merged in code. Examples: market research, competitor teardown.
3. Adversarial verification: a separate agent checks the output against a rubric. Example: fact-checking a PRD against the sources.
4. Generate-and-filter: many candidates, deduped, the survivors kept. Examples: naming, positioning, ideation.
5. Tournament (compare): agents attempt the task different ways, judges compare until one wins. Example: product strategy.
6. Loop-until-done: spawn until a stop condition. Example: implement, document, and test a feature in one shot.
Prompt engineering has been replaced by loop engineering.
What is it? (Explained in 60 seconds)
For the past 2 years we have been prompting agents with individual tasks. That is starting to change.
So far, if you wanted an agent to build a dashboard for a client, you would give it a task, review the output, improve the prompt, and repeat the process until the work was done.
Looping changes that.
Instead of giving an agent individual tasks, you give it a goal and let it work through a recursive loop until that goal is met.
For example:
→ Research
→ Draft
→ Evaluate
→ Test
→ Improve
→ Repeat
The agent keeps cycling through the loop until it reaches the standard you defined.
Within loop engineering there are two main approaches:
1. Open Looping
You give the agent a goal and allow it significant freedom in how it achieves it.
This is powerful, but also expensive and harder to control.
2. Closed Looping
The human defines the architecture, constraints and evaluation criteria.
The agent is then responsible for executing, improving and iterating within those boundaries until the goal is reached.
The next evolution is orchestrated looping.
Instead of a single agent running a loop, one agent breaks the goal into smaller tasks and assigns them to specialist agents.
Each specialist runs its own loop and reports back.
In other words:
You move from one agent improving itself to an entire team of agents iterating together until the goal is achieved.
this is f*cking gold
the Claude setup most people will never find on their own
if I had this a year ago, I would've shipped my first app in a day instead of 3 weeks.
in the right hands, this changes everything:
Tesla AI chip design engineering reviews are so great! Team is awesome.
Our AI6 chip might set a record for most amount of usable intelligence from a wafer when factoring in yield.
This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.
I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!
Patience, dedication, resilience and humble attitude throughout this impressive journey are truly inspiring. Thank you for setting such a powerful example of what perseverance and hard work can achieve.
Read the Success Story : https://t.co/AMvWa6K3xv
@Beesolverindia
Patience, dedication, resilience and humble attitude throughout this impressive journey are truly inspiring. Thank you for setting such a powerful example of what perseverance and hard work can achieve.
Read the Success Story : https://t.co/AMvWa6K3xv
@Beesolverindia
This is the best site on the internet to learn harness engineering.
Free. Completely.
Most AI engineers have never heard the term.
https://t.co/bwDbTTYsjM
Bookmark this site.
Then read this setup ↓
7 things we built with Opus 4.8 on Hyperagent 👇
1. Mars rover pathfinding simulator
2. Standup Island: a cozier place to review the kanban, inspired by @every's livestream today
3. SpaceXAI + Anthropic partnership visualized
4. Landing page for an outdoor brand w/ Nano Banana + Veo
5. Multi-agent command center
6. Black hole explainer
7. Emergent ecosystem simulator
In our vibe check, 4.8 shows:
- more varied design sensibilities
- better self-correction over long-running tasks
- excellent spatial reasoning
- more natural copywriting
- fewer obvious coding errors
- more resourcefulness during reasoning
Links below to every interactive artifact shown