Too many RL ideas die at the edge of the LLM/VLM/VLA training stack. Not anymore.
With FeynRL, new algorithms ideas do not have to fight the whole stack ๐. Focus on the alg while still training very large models.
https://t.co/30CAbnxwIn
Try it, ๐ it, send feedback.
@TsendeeMTS@ChangHao564792d
Great paper. We also studied and proposed some form of DAgger style training for LLM but for SFT using scheduled sampling.
https://t.co/rSFTBBfLIq
Higgs Audio v3 TTS is here.
Built for voice AI that speaks, not just reads:
โข 100 languages with single-digit WER/CER
โข inline control over emotion, style, prosody, and sound effects
โข API, Workspace, and open weights
โข Blog ๐ https://t.co/C8frDlfO5D
Watch the demo ๐
@cwolferesearch I'd add https://t.co/bdzM4JJbIO to your list!
I've implemented it in very modular & clean way so system stays systems and algorithm stays alg. It helps to understand how RL training works without requiring understanding entire stack.
Take a look and you will see the difference!
"agentic" should not require fundamentally different training stacks, as long as the framework easily supports env. Checkout https://t.co/30CAbnxwIn It is a clean & modular RL framework. We'll add env example soon, but the core training/rollout are the same.
U r welcome to contribute tho!
Off-policy data does not have to be a bug in RL. In our work, we shift the question from:
Is this data on-policy? -> How much should we trust this batch?
That change leads to a adaptive objective for RL LLM-training.
blog: https://t.co/HIJUbi7hUg
New blog post: Effective Sample Size
Reweighting data to fix distribution shift kills bias but piles everything onto a few points. ESS measures how much data you actually have left, the dial for when a replay buffer goes stale.
https://t.co/YE6bzhVw64
Shoutout to my fantastic co-organizers for making the first-ever workshop on RL Environments & Agent Evals such a success!
@rasoolfa, @anishathalye, @aagohary, @natashajaques, @TheAndiPenguin, @migballesteros, Aziza Mirsaidova, Priyaranjan Pattnayak, Ahmed Elgohary, Alina Gavrilov, Aparna Elangovan, Graham Horwood
Packed room to hear @alexgshaw and @ryanmart3n break down how @harborframework grew into *the* framework for RL environments.
In our RLEval workshop at @CAISconf today, attendees tackled big open challenges in RLEs & Agent Evals + I shared the approach we take at @joinHandshake
Join us tmrw if you are around at RLEval, first edition ever. Together with my wonderful co-organizers (@jomulr , @anishathalye, Alina Gavrilov, Aziza Mirsaidova) we have put together an exciting program with a great lineup of speakers and papers.
See the full program here:
https://t.co/JZd9U2SzlH
ps: Iโll also give a talk on what is wrong with current RL methods and frameworks, and why these issues can slow progress in using RL for large-model.
Tomorrow in San Jose: RLEval. Trillions going into LLM agents and we still cannot reliably evaluate them. 19 papers, talks from Alex Dimakis, Corby Rosset, Rasool Fakoor, and others. I'll be presenting Submodular Benchmark Selection from @boson_ai.
https://t.co/WksVTSirZp
@vivek_2332@adithya_s_k have u tried https://t.co/bdzM4JJbIO? I've released it for very same reasons (your 2nd and 3rd items). try it and lemme know what u think!
@adithya_s_k Super useful resource, thank you for putting it together!
Researchers working on RLE design & Agent Evals might consider submitting papers / attending the first-ever Workshop in this area at the upcoming ACM Conference on AI and Agentic Systems:
https://t.co/FpWtMJsnv1
Once you check it out, youโll see the difference immediately. As #ICLR2026 wraps up, this might be a good starting point for your next idea, startup, project, or conference submission.
Too many RL ideas die at the edge of the LLM/VLM/VLA training stack. Not anymore.
With FeynRL, new algorithms ideas do not have to fight the whole stack ๐. Focus on the alg while still training very large models.
https://t.co/30CAbnxwIn
Try it, ๐ it, send feedback.
One thing I keep hearing is that RL for L(L)Ms is "mostly a systems problem now" and the RL part is basically good enough.
I really donโt buy that. Current RL algs are still fragile as hell. Better systems help, but they donโt magically make the RL problem go away.
Are you working on RL, principled ways to build RL envs for agent training, or effective evaluation for agents? Want to showcase your NeurIPS submission? or just discuss about research more broadly?
Then consider submitting and attending to our first ever workshop on Methods and RL Environments for Evaluating AI Agents. Deadline: May 11
https://t.co/MOxooUom5i
@novasarc01@oneill_c Well, we released one but we want to focus back on RL rather than on system. The goal is to provide a clean framework that people understand and build new RL alg without having to deal with a convoluted code. Take a look and you'll see the difference https://t.co/bdzM4JJbIO
@ClementDelangue@badlogicgames I'd suggest trying this to do post-training https://t.co/bdzM4JJbIO while things are built to be clear and modular, at the same time you can run large scale experiments. Take a look and you will see the difference!