🚀Introducing UniRL, an RL infra for unified multimodal models. Together with two new RL algorithms: DRPO and Flow-DPPO.
One RL loop across diffusion/flow matching models, LLMs/VLMs, and unified multimodal models👇
Code: https://t.co/fhKEqqFpc8
(yes — U(you)-ni-(need) RL ���)
🚨 Uniform token-level trust regions are not enough for LLM RL!
Our new paper: Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning.
We introduce CPPO, a drop-in mask that reallocates divergence budget by position & prefix drift 👇
https://t.co/svooEqAcss
China's military exercises around Taiwan in August 2022 and March 1996 (Third Taiwan Strait crisis). This time, some exercise areas overlap with Taiwan's territorial waters, an apparent escalation.