Kevin @Kevinduan2014 - Twitter Profile

"I don't prompt my agent anymore — I write the loop that prompts it." Loop engineering is the meme of the month. But here's the question under the hype: a loop that runs while you sleep isn't automatically a loop that learns. Whether experience survives each cycle depends entirely on where it lands. We just dropped a survey on exactly that — how deployed agents in the Era of Experience turn interaction traces into durable capability, from self- to meta-evolution. Last time we argued: you don't pick a model, you pick a benchmark — the reward that shapes where it evolves. This is the promised self-evolution follow-up. The loop is only as good as the infrastructure underneath it. Full map 👇 💻 GitHub: https://t.co/4uYTkONjeY 🌐 Site: https://t.co/Tfg0RpM5r4 📄 Paper: https://t.co/p9KbjzSFm6

OkhayIea's tweet photo. "I don't prompt my agent anymore — I write the loop that prompts it."

Loop engineering is the meme of the month. But here's the question under the hype: a loop that runs while you sleep isn't automatically a loop that learns. Whether experience survives each cycle depends entirely on where it lands.

We just dropped a survey on exactly that — how deployed agents in the Era of Experience turn interaction traces into durable capability, from self- to meta-evolution.

Last time we argued: you don't pick a model, you pick a benchmark — the reward that shapes where it evolves. This is the promised self-evolution follow-up.

The loop is only as good as the infrastructure underneath it. Full map 👇

💻 GitHub: https://t.co/4uYTkONjeY
🌐 Site: https://t.co/Tfg0RpM5r4
📄 Paper: https://t.co/p9KbjzSFm6

16

343

58

338

20K

Kevin

@Kevinduan2014

3 days ago

@jedwards_27 I guess the harness for deployment needs integration with security, auth, data access control which needs to be ready for the kit. What do you think is the most critical

1

4

0

2

7K

Kevin

@Kevinduan2014

5 days ago

@emollick @AndrewCurran_ The hard part is how each entity, individual or corporation, to revamp their models of operation with existing capabilities. It’s a very slow diffusion process. Good for the entrepreneurs and bad for the public to perceive as inequalities

0

157

Kevinduan2014 retweeted

Qwen

@Alibaba_Qwen

6 days ago

📣📣 Meet Qwen-AgentWorld — a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation. 🤔 LLMs are trained to be better agents — better at acting in environments. But nobody has trained them to model the environments themselves. 🗺️ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes: 1️⃣ Build a foundation model for environment simulation — outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench 2️⃣ Investigate how world modeling enhances agent training: 🔬 Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments 🧠 Learning to predict environments (LWM warm-up) makes agents stronger — remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning 📑 Paper: https://t.co/Jx2l5RKq71 📖 Blog: https://t.co/7tVcKyhsx2 💻 GitHub: https://t.co/B5Lvb1UZCn 🤗 HuggingFace: https://t.co/Kw3QBL1TM5 🧩 ModelScope: https://t.co/YBnGYgMWWI

Alibaba_Qwen's tweet photo. 📣📣 Meet Qwen-AgentWorld — a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation.

🤔 LLMs are trained to be better agents — better at acting in environments. But nobody has trained them to model the environments themselves.

🗺️ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes:

1️⃣ Build a foundation model for environment simulation — outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench

2️⃣ Investigate how world modeling enhances agent training:
🔬 Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments
🧠 Learning to predict environments (LWM warm-up) makes agents stronger — remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning

📑 Paper: https://t.co/Jx2l5RKq71
📖 Blog: https://t.co/7tVcKyhsx2
💻 GitHub: https://t.co/B5Lvb1UZCn
🤗 HuggingFace: https://t.co/Kw3QBL1TM5
🧩 ModelScope: https://t.co/YBnGYgMWWI

201

5K

784

4K

1M

Kevin

@Kevinduan2014

8 days ago

@benjamin_warner @Yuchenj_UW Really? The eval rubric should include both quality of agent outcome and token consumption

0

2

Kevinduan2014 retweeted

Cameron R. Wolfe, Ph.D.

@cwolferesearch

9 days ago

Publishing a blog on agentic RL (probably the first part of many) on Monday morning. Here are the papers that are currently included: - AgentGym-RL: https://t.co/s8dPXX0LlG - Agent-R1: https://t.co/xpfQJGgzEv - Agent-RL: https://t.co/7fbVl0RWXG - AutoForge: https://t.co/mr3WDIL5vq - RAGEN: https://t.co/dp8bZfMlA4 - RAGEN-2: https://t.co/pt4QIMgf5K - ToRL: https://t.co/rrQ8lTlY5r Also planning to cover more details on: - Echo (https://t.co/akClpFhX4M) / Paw (https://t.co/kmJjj69BdH) and using action masking versus running SFT on environment tokens. - Properly setting up scalable infra for RL environments and trends in this area. - RL training infra trends, specifically using disaggregated / asynchronous architecture. - GLM-5.2 stability (migrating from GRPO to PPO for long horizon tasks). Please send me more papers, I'll either try to include them in this blog or in future writeups!

cwolferesearch's tweet photo. Publishing a blog on agentic RL (probably the first part of many) on Monday morning. Here are the papers that are currently included:

- AgentGym-RL: https://t.co/s8dPXX0LlG
- Agent-R1: https://t.co/xpfQJGgzEv
- Agent-RL: https://t.co/7fbVl0RWXG
- AutoForge: https://t.co/mr3WDIL5vq
- RAGEN: https://t.co/dp8bZfMlA4
- RAGEN-2: https://t.co/pt4QIMgf5K
- ToRL: https://t.co/rrQ8lTlY5r

Also planning to cover more details on:
- Echo (https://t.co/akClpFhX4M) / Paw (https://t.co/kmJjj69BdH) and using action masking versus running SFT on environment tokens.
- Properly setting up scalable infra for RL environments and trends in this area.
- RL training infra trends, specifically using disaggregated / asynchronous architecture.
- GLM-5.2 stability (migrating from GRPO to PPO for long horizon tasks).

Please send me more papers, I'll either try to include them in this blog or in future writeups!

8

521

60

589

24K

Kevin

@Kevinduan2014

8 days ago

@emollick A concrete example of deep knowledge work?

0

40

Kevin

@Kevinduan2014

10 days ago

@emollick My rule to my kids: use ai to learn, not to cheat.

0

287

Kevin

@Kevinduan2014

10 days ago

@emollick Agreed from personal experience. Fundamentally management is about goal setting, context sharing, and key decision guidance.

0

463

Kevin

@Kevinduan2014

10 days ago

@rauchg Agent creation is fundamentally a system engineering problem. That’s why that shares with classical system/software design techniques.

0

51

Kevin

@Kevinduan2014

11 days ago

@MatthewBerman Sometimes it’s just dangerous to use LLM as judge. Hard to deploy in high stakes envs

0

108

Kevin

@Kevinduan2014

11 days ago

@bcherny Will try

0

10

Kevin

@Kevinduan2014

11 days ago

@bradenjhancock Learn to learn instead of learning to compress and search

0

15

Kevin

@Kevinduan2014

11 days ago

@jietang @hanouticelina @elonmusk @teortaxesTex Looking forward!

0

460

Kevin

@Kevinduan2014

12 days ago

@scion_x_ It’s hard to predict the final path to true intelligence, but not that hard to see the path to massive economic values even with current ai architecture

0

71

Kevin

@Kevinduan2014

Last Seen Users on Sotwe

Trends for you

Most Popular Users