I've been researching Agents for the past 6 months and collected 40+ materials on the most capable architectures & implementations. The intent was to publish a comprehensive overview, like I did on RAG techniques, but been too busy with https://t.co/KlojywayNY, so sharing it here.
There are some great intro lectures by Andrew Ng to start with.
The following types of Agentic architectures are covered:
🤖 Chain of thought (Plan & Execute agent)
🤖 Tooling operators (An agent upon a set of tools, routing to them) - good for connecting external data storage & APIs, pretty fast and robust
🤖 ReAct (Thought - Action - Observation) - capable of iteratively executing complex tasks or answering complex queries
🤖 Self-Reflection - (Action - Observation / Evaluation - Reflection - Planning) - adds some quality and reasoning clarity compared to the ReAct scheme, might be slower
🤖 Agent upon agents (A multiagent scheme) - a quite complex setting, slow, but capable of executing very complex multistep tasks, not super robust as loops are a frequent issue.
Most successful projects: @Auto_GPT, @AgentGPT, @MemGPT, GPT-Researcher, @crewAIInc, @MetaGPT_.
There are also some arXiv papers & blog posts on the most important architectures.
🔗 All the materials are here: https://t.co/HUGYoAJEhw
🧠 The best part is there is a co-pilot to chat with all this knowledge!
If you’d like to add some valuable publications on Agents to this collection - just share a link in the comments 👇
AI as a thinking partner.
Most AI discourse is about execution — code, emails, automation. The more interesting shift comes when you start thinking with AI – more context, more precision, wider range than your biological hardware alone allows. Strategy, decisions, clarity on things you could only feel before.
I pointed Cursor — an IDE — at a decade of my Apple Notes and Google Docs with plans, ideas, and reflections, plus fifteen years of financial spreadsheets to ground anything business- or money-related in real numbers. No manual exports, wrote integrations with both apps in Cursor itself (scripts to pull notes, parse formats, handle two languages — adapting as it went), saved everything in .md format – it took me about an hour to build this second brain project.
And then comes the cool part – I asked an Opus-powered agent to analyse the core topics in these notes. It found sixteen intellectual threads I'd been developing without realizing they formed a system. Then I asked it to enrich these threads with related context over the years. Cursor has an excellent RAG system, so I ended up with a document outlining my thought arc across a decade of ideas. I could observe the evolution of each stream – motivation, actions taken, outcomes, learnings, how my thinking transformed. Intuitions I'd been carrying for years were decomposed into concrete components. This brought insights, further ideas, questions, and, most importantly, decisions. A vague sense that "there is some potential in that one project" became a structured analysis with timelines, contributions needed, and three outcome scenarios with numbers attached, grounded in current market analysis.
Each idea thread you feel important today deserves iterating over – fetch some current context from the web if it helps, spiral into questions, look for contradictions, get to a deeper understanding. Save it. This way you organise your context system, making it rich yet easily discoverable for an agent. Clear naming helps; tagging particular docs as context for a new query is even better.
Then, new actionable context keeps flowing in – opportunities, decision points, technical papers, plans, and, surprisingly, new ideas. Reasoning over it with an LLM grounded in my thoughts, actions, and reflections produces decision quality beyond any memory-enabled ChatGPT answer. A complex PDF deal contract gets sliced against my legal status, tax obligations, financial situation, and current goals; edits come with the reasoning behind them and wording to communicate them back. A new tech paper is read against the architecture of my project, returning concrete implementation options.
This is real personalisation — very few people I know would give better advice, and I wouldn't bother any of them diving deeper into the thought spirals I often chase with LLMs. Looks like we're around the AGI. It's not magic, just curiosity, context management, great RAG, and the amazing Claude Opus.
@karpathy We've started building https://t.co/GEm1xMTljg – a digital library for professionals & teams 4 years ago.
LLM is the interface to your knowledge, and we parse URLs, videos, and of course, all text formats.
Fancy agentic RAG is under the hood.
📟 Meet SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents!
We at Nebius AI R&D are releasing the biggest open dataset of RL environments for training coding agents. We built an automated pipeline to extract real-world tasks at scale, and now we are sharing everything with the community. This release is designed for large-scale RL training.
What’s inside:
> 32,000+ executable tasks — every task is based on a real-world issue and comes with a pre-built Docker env.
> 20 programming languages — moving beyond Python-only datasets (including less-represented ones like Lua, Clojure, etc.).
> 120,000+ extra tasks derived from real pull requests.
> High quality — tasks are filtered and labeled using an LLM ensemble. They are also enriched with metadata and tested interfaces to ensure solvability.
We are also dropping a technical report with all the details on our extraction pipeline and model evaluations.
📄 Paper and dataset: https://t.co/dMQ0kLSqbi
👾 Discord (we are online there for any feedback/issues): https://t.co/2rJoX8pp16
We are open to research collaborations — feel free to reach out!
🔁 If you find this useful, please help us spread the word by sharing
@karpathy Of course, RAG and Agentic patterns like ReAct are mainstream for 2 years, they are all about context, not just prompting.
Sounds a bit trivial TBH
@karpathy You can rephrase it another way - a system prompt is one thing that is not learning now, one parameter that is not updated after the agent gets more experience interacting with environment. Actually having a tool to update system prompt after some findings would work as v0
My guess is that MCP approach is just the beginning – LLM APIs would emerge to plug in services as tools instead of the traditional front-end of your webpage.
Consumer LLMs like ChatGPT and Claude are becoming the new gateways to the Internet, replacing traditional search engines.
This will reshape the whole structure of the Internet, rewiring web traffic patterns – we'll get information, shop and even use other sevices through chats.
🚀 March has been quite a month for https://t.co/Jc50mgRq9T 🥳
📈 x2 MRR and growing with $0 marketing spend this month
🧠 Agent in collections now takes time to think and produces a highly precise and comprehensive result–perfect for automating analytical tasks. We’ve also introduced thinking step visualization, so you can see exactly how the AI reasons through your queries. Watching it reflect on your question is captivating!
✨ A ton of UI updates like a draggable co-pilot separator, code snippets highlights, markup support in notes, and many more small things, making the product enjoyable.
🥰 Most importantly, we're getting lots of positive feedback from our users, not on the idea, but on the product itself. That makes the whole startup journey and grind worth it.
@kevinrose@JustinMezzell@alexisohanian@digg Not an expert here, but for me it looks like we’ve got all the tech to do human / non-human verification right now with 99% accuracy.
The other thing the post touches is the credibility of human claims - is there anything better than relying on reputation and expertise?
⚡️https://t.co/Jc50mgRq9T product video is organically going viral on Youtube ⚡️
- 12.5k views and growing
- +2k users in the app
- coming to 500 likes
- almost 100 comments
- +400 subscribers on the channel
Curious how? Ask @TheMaxOr 🧙♂️
check it out: https://t.co/8LScBVS06j
@SergeiLavrukhin Hi Sergei! You can add a list of links at once, just need to export them from the other places. We’ll add YT playlist processing, that’s a great thing
🚀 A big day for us – IKI 2.0 goes live on Product Hunt now https://t.co/a8lYLVFAAV 🚀
🎁 There is a nice discount - check it out 👀
Since our first launch 9 months ago, we’ve reimagined IKI as an LLM-native space for professional knowledge, gained thousands of paying customers, polished and scaled the platform, and now are happy to invite you to IKI 2.0.
Our mission has not changed though - we’re here to equip knowledge workers with the best LLM-powered digital library—a true second brain, period.
And we’ve done a lot to materialize this vision - in IKI 2.0 you will enjoy:
⭐ URLs, PDFs, YouTube, txt, docx, md, images & notes support
⭐ Spaces to collaborate with your teammates
⭐ OS-like navigation with a hierarchy, drag & drop and multi-select
⭐ SOTA reasoning LLMs, including Claude 3.7 Sonnet, o3-mini, and Grok-2
⭐ Cutting-edge web search powered by Perplexity Sonar
⭐ Smart editor to write with IKI assistant
⭐ Deep comparative analysis & reasoning over collections – shipping this month!
⭐ Cloud integrations with Google Drive, Notion, Dropbox, and Obsidian are coming soon
Happy to have you among our early adopters ✨