"I don't prompt my agent anymore — I write the loop that prompts it."
Loop engineering is the meme of the month. But here's the question under the hype: a loop that runs while you sleep isn't automatically a loop that learns. Whether experience survives each cycle depends entirely on where it lands.
We just dropped a survey on exactly that — how deployed agents in the Era of Experience turn interaction traces into durable capability, from self- to meta-evolution.
Last time we argued: you don't pick a model, you pick a benchmark — the reward that shapes where it evolves. This is the promised self-evolution follow-up.
The loop is only as good as the infrastructure underneath it. Full map 👇
💻 GitHub: https://t.co/4uYTkONjeY
🌐 Site: https://t.co/Tfg0RpM5r4
📄 Paper: https://t.co/p9KbjzSFm6
@jedwards_27 I guess the harness for deployment needs integration with security, auth, data access control which needs to be ready for the kit. What do you think is the most critical
@emollick@AndrewCurran_ The hard part is how each entity, individual or corporation, to revamp their models of operation with existing capabilities. It’s a very slow diffusion process. Good for the entrepreneurs and bad for the public to perceive as inequalities
📣📣 Meet Qwen-AgentWorld — a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation.
🤔 LLMs are trained to be better agents — better at acting in environments. But nobody has trained them to model the environments themselves.
🗺️ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes:
1️⃣ Build a foundation model for environment simulation — outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench
2️⃣ Investigate how world modeling enhances agent training:
🔬 Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments
🧠 Learning to predict environments (LWM warm-up) makes agents stronger — remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning
📑 Paper: https://t.co/Jx2l5RKq71
📖 Blog: https://t.co/7tVcKyhsx2
💻 GitHub: https://t.co/B5Lvb1UZCn
🤗 HuggingFace: https://t.co/Kw3QBL1TM5
🧩 ModelScope: https://t.co/YBnGYgMWWI
Publishing a blog on agentic RL (probably the first part of many) on Monday morning. Here are the papers that are currently included:
- AgentGym-RL: https://t.co/s8dPXX0LlG
- Agent-R1: https://t.co/xpfQJGgzEv
- Agent-RL: https://t.co/7fbVl0RWXG
- AutoForge: https://t.co/mr3WDIL5vq
- RAGEN: https://t.co/dp8bZfMlA4
- RAGEN-2: https://t.co/pt4QIMgf5K
- ToRL: https://t.co/rrQ8lTlY5r
Also planning to cover more details on:
- Echo (https://t.co/akClpFhX4M) / Paw (https://t.co/kmJjj69BdH) and using action masking versus running SFT on environment tokens.
- Properly setting up scalable infra for RL environments and trends in this area.
- RL training infra trends, specifically using disaggregated / asynchronous architecture.
- GLM-5.2 stability (migrating from GRPO to PPO for long horizon tasks).
Please send me more papers, I'll either try to include them in this blog or in future writeups!
@scion_x_ It’s hard to predict the final path to true intelligence, but not that hard to see the path to massive economic values even with current ai architecture