We are cooked.
China's Alibaba just revealed Wan Streamer.
AI agents can now see you, hear you, and talk back on video in real time.
This is not voice mode anymore 🤯
There’s a big misconception about how GLM 5.2 was trained. Yes, they distilled Claude and GPT 5.5 — but distillation is not how they matched Opus quality. Distillation only fixed the cold start problem in RL.
RLing an agentic coding model isn’t rocket science. In simplified terms:
1. RL needs trajectories — rollouts where the model actually completed a task in some env
2. No successful trajectory on a task = zero gradient = you can’t RL it. This is the cold start problem
3. Distillation solves it. You seed your model with knowledge from a smarter one (Claude, GPT) on tasks it can’t do yet
4. Now it produces positive trajectories on those tasks
5. RL on those trajectories and hill climb agentic coding
6. At that point you no longer need to distill and can solely hill climb RL to better models
This is an interesting curve. I’d argue it’s harder to get to Opus 4.8 from scratch than to go from Opus 4.8 → Fable/Mythos tier.
GLM 5.2 is already producing positive trajectories, so they have plenty to RL on — they’ll keep climbing to Mythos quality without distilling any further. They no longer need American models.
PPO had a second wave in the LLM era for reasons unanticipated by the original paper
- the importance-ratio objective fixes biases from numeric error, async training, and forward pass noise
- the clipping objective affects entropy through a mechanism that we didn't know about at the time of publication (DAPO, https://t.co/sBo9DeFS5Y)
“Sir… Clive Chan… our top chip designer on the Broadcom project… just quit and joined Anthropic BEFORE we even shipped anything… sir… our custom chips are still coming in 2026… right?”