@RE5IDENTIAL@bcherny Why would you build with co work. Co work is at its base to give the agentic features of Claude code (and local file access) to regular users. If you are building stick to Claude code. I assume by “cli” you meant Claude code which more accurately is “terminal”
@Patrick9704133 @bcherny Why are you talking about tokens when he’s talking about co work in desktop on a plan. 🤦♂️. There is no api for co work - where tokens would matter for additional cost.
Solid video, glad you're putting it in front of more people.
Quick note: it's 10 components in the talk, not 6. They show the full list on a slide and iterate the demo prompt up across five versions.
The part I'd push back on a bit is the framing that a Skill removes the need to think about prompt structure. Hannah and Christian explicitly call prompt engineering iterative empirical science. The scaffold is the easy part. The actual lift is task context, domain rules, few-shot examples, output schema, all of which depend on what you're building. How does the Skill handle that?
A few things worth correcting about this post, because the RLM paper is interesting and deserves an accurate read.
The paper does not claim RLM replaces RAG. The authors frame it as inference-time scaling for long-context reasoning and explicitly position it against agents, not retrieval pipelines. The lead author's own blog describes the BrowseComp-Plus experiment as testing whether you can use a single RLM call "instead of building an agent." That's the actual framing.
The 0.04 vs 58.00 benchmark is base GPT-5 vs RLM on OOLONG-Pairs. It is not RAG vs RLM. The only retrieval baseline in the paper is ReAct + BM25, which is keyword-only retrieval. The paper never benchmarks RLM against hybrid search plus reranking, which is what production RAG actually looks like today.
"RAG permanently deletes nuance" is also wrong. Hierarchical chunking preserves parent content. Retrieval granularity is a configurable knob, not a wall. The post is attacking a strawman of RAG that hasn't been current for over a year.
The latency tradeoff is real too. The authors themselves note RLM queries can take "a few seconds to several minutes" with no async or prefix caching in the current implementation. Fine for deep research agents and report generation. Wrong for real-time chat.
Production pipelines that actually ship use three layers:
* Vector + keyword hybrid search for recall
* Cross-encoder reranking for precision
* RLM-style agentic exploration for complex multi-hop queries
Those are complementary layers solving different problems at different stages. Anyone calling RLM a RAG replacement is either misreading the paper or farming engagement.
RLM is a valuable third layer. It is not the end of retrieval.
@bcherny@bcherny I’m assuming there are issues still with dispatch since it’s so new, I can’t get anything to connect on any computer with my phone. Can’t wait to try this. I’m ex IT and there’s no real troubleshooting I can do since it’s just searching and fails. All the same account
@trq212 I’d love to try this, but honestly Claude mobile still doesn’t work and I can’t even submit feedback without an error. I need that one over this. They could work together. Widespread issues that need to be resolved NOW. And I’m an Anthropic fan believe me.
@minchoi How about not “vibe coding” and learn tech stacks and planning and best practices. Let’s not encourage bad habits. If I never heard the term vibe coding again it wouldn’t be soon enough. Also having a real plan for something like open claw and knowing security is paramount.
@openclaw And everyone will continue to get hacked because all of you don’t know what you are doing this this shouldn’t have been released to the public until it was very clear how to be properly secured 🤣
@ClaudeCodeLog Love it. But could we please for the love of god allow tool disables on mcp’s like we can do in Claude desktop. Absolutely no reason to not allow this.
@danshipper They are all important and have a place. Some build on each other and some overlap. Prompts, slash commands, mcp, agents/sub agents, skills all have use cases. 💪🏼