๐ Introducing d2 โ a principled and efficient RL framework for improving reasoning in diffusion language models (DLMs).
RL works well for autoregressive LLMs.
But for DLMs? Itโs fundamentally harder.
We show how to do it right. ๐
๐ https://t.co/Kg5GndV3oA
๐ https://t.co/YAGUAcspsP
๐ป https://t.co/sQKqirA1Re
๐งต1/12