@HongyiWang10@RutgersCS@RutgersU Excited to share PR2! PR2 predicts short-horizon router evolution and replays the predicted routes, improving stability and reasoning performance. Happy to discuss more!
1/8) Excited to share PR²: Predictive Routing Replay for MoE-Based LLM RL! 🎉
This is also the first paper from our research group at @RutgersCS@RutgersU.
While MoE LLMs scale remarkably well, RL training exposes a hidden source of instability: routing.
Paper: https://t.co/uekygKHSON
A special shout-out to my talented PhD student, @DaizeDongCS, for leading this project and driving many of its key ideas and experiments.