My implementation of the Recursive Language Model (RLM) paper by @a1zhang , Kraska, and @lateinteraction .
Key insight: "Treat long context as an external environment, not something to stuff into a context window."
Applied to video understanding — instead of encoding 38K frames into a prompt, the agent:
→ Treats video as an environment
→ Writes code to explore segments
→ Uses recursive LLM sub-calls for analysis
Tested: 20+ min video, 7 steps, $0.002
Paper: https://t.co/sMkqVscWZD
Code: https://t.co/J3GxdlKeav
I want some kind of LLM workflow tool.
• Ability to manage a set of input files (Markdown or similar), plus other general-purpose context.
• With real-time collaboration. (And maybe some concept of snapshots or VCS integration.)
• And the ability to create/manage a inference workflows and a stored set of prompts.
• Access to general-purpose coding agents (and not just chat models).
• Some concept of compiled outputs/inference results (which ideally can be shared externally).
Many projects have this feeling: "there is all this stuff, which I want to process/compute over in this iterated way, with some build artifacts being important/worth saving." GNU Autotools x Notion or something. Is anyone building this?
I found Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses really interesting. It’s from researchers at UIUC, UC Berkeley, and Chroma, and the core idea is simple but important: don’t force the model to remember and manage all search state inside a messy transcript. Move the bookkeeping into the harness, then let RL focus on the actual semantic decisions.
Why I recommend reading it:
• It frames harness design as part of the learning problem, not just infrastructure around the model.
• The agent keeps explicit working memory: candidate pools, curated evidence, verification records, evidence graphs, and budget-aware context.
• Harness-1, a 20B search agent, beats strong open search agents across eight retrieval benchmarks and stays competitive with much larger frontier models.
• The most interesting part is transfer: the gains are stronger on held-out benchmarks, which suggests the model is learning general search behavior, not just memorizing domains.
Main takeaway: this paper came at the right time because everyone is trying to make agents more reliable, and it shows that better RL might require better environments, not just bigger models or stronger rewards.
Sam Altman deserves credit for YC's turn toward hard tech. When he became CEO in 2014 he went out and recruited companies doing stuff like airliners and fusion, and hard tech startups have been some of the best in every batch since.
@karpathy and @DarioAmodei are pointing at one of the most important research loops in AI: systems that improve the process of building better systems.
Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor.
It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx
We’re going all in on World Models.
Today we’re launching the 1X World Model Lab.
The bet is simple:
You can’t fine-tune your way to AGI.
And you definitely can’t fine-tune your way to robots that can operate in the physical world.
General-purpose humanoids need models that understand space, motion, objects, causality, affordances, physics, and action before they ever see a specific task.
The frontier is not better VLA wrappers.
The frontier is embodied world models.
The 1X World Model Lab will focus on large-scale embodied world model pretraining: building the most generalizable foundation model for humanoid robots from the ground up.
The next frontier in AI requires scaling:
web-scale media + egocentric human videos + sim + dexterous remote operated robot data + on-policy NEO data → real-world deployment for robot data collection and RL → abundance of data → physical AI
The robot collects data.
The model gets better.
The robot gets better.
Repeat.
To lead this, we brought in one of the best for the mission: @_sam_sinha_ , as Head of World Models.
Sam was a founding research scientist at Luma AI and has been at the frontier of scaling multimodal generative video models his whole career.
If you’re the best in the world at large-scale pretraining, video models, robotics, RL, infra, or data — and you want your models to move atoms, not just pixels — join us.
Send background + evidence of exceptional ability to:
[email protected]
We’re building the model that makes autonomous labor real.
1/10
Really interesting paper: Self-Distilled Policy Gradient (SDPG).
Core idea: RLVR gives strong outcome rewards, but they are sparse.
Self-distillation gives dense token-level signals, but can collapse.
SDPG tries to get the best of both.
10/10
The ablations are the real lesson.
Removing OPD loses early accuracy gains. Removing KL hurts reasoning structure.
On Qwen3-1.7B, SDPG still wins, while pure self-distillation OPCD collapses after ~250 steps.
Takeaway: dense self-distillation works best when grounded by verifier rewards and anchored by KL.