Srinivas Ramasubramanian @stablegradients - Twitter Profile

about 1 month ago

Excited to share that our lab will present two Orals at the ICLR SPOT workshop this Monday: • Maximum Likelihood Reinforcement Learning (10:10–10:20) — 🏆 Best Paper Award • Expanding the Capabilities of Reinforcement Learning via Text Feedback (10:20–10:30) — Oral + 🏆 Outstanding Paper Award at LLA Workshop Come and say hi!

2

53

11

10

4K

stablegradients retweeted

Fahim Tajwar @FahimTajwar10

4 months ago

Are we done with new RL algorithms? Turns out we might have been optimizing the wrong objective. Introducing MaxRL, a framework to bring maximum likelihood optimization to RL settings. Paper + code + project website: https://t.co/j9BCBF7K3R 🧵 1/n

14

806

161

728

208K

stablegradients retweeted

Vision and AI Lab, IISc @val_iisc

6 months ago

Huge congratulations to Dr. 𝐇𝐚𝐫𝐬𝐡 on receiving the 2025 𝐈𝐊𝐃𝐃 𝐃𝐨𝐜𝐭𝐨𝐫𝐚𝐥 𝐃𝐢𝐬𝐬𝐞𝐫𝐭𝐚𝐭𝐢𝐨𝐧 𝐀𝐰𝐚𝐫𝐝! 🏆 We are incredibly proud to celebrate a 𝐡𝐚𝐭-𝐭𝐫𝐢𝐜𝐤 of successes for VAL:🔹2025: Harsh (Winner)🔹2024: Sravanti (Winner)🔹2023: Jogendra (Runner-Up)

val_iisc's tweet photo. Huge congratulations to Dr. 𝐇𝐚𝐫𝐬𝐡 on receiving the 2025 𝐈𝐊𝐃𝐃 𝐃𝐨𝐜𝐭𝐨𝐫𝐚𝐥 𝐃𝐢𝐬𝐬𝐞𝐫𝐭𝐚𝐭𝐢𝐨𝐧 𝐀𝐰𝐚𝐫𝐝! 🏆
We are incredibly proud to celebrate a 𝐡𝐚𝐭-𝐭𝐫𝐢𝐜𝐤 of successes for VAL:🔹2025: Harsh (Winner)🔹2024: Sravanti (Winner)🔹2023: Jogendra (Runner-Up) https://t.co/vilcxX0Vhr

0

13

1

833

Srinivas Ramasubramanian @stablegradients

6 months ago

@knarfroeder you can try now, it should be active now.

0

1

0

43

Who to follow

Vikash Yadav

@Vikash837000

Pursuing Master's degree at the @iiscbangalore. Graduate from @IITGuwahati. Research Interests: Deep Learning and Domain Adaptation

Akshay Kulkarni

@ak70000

CS PhD student at @UCSanDiego | Prev. @Livermore_Lab @SonyAI_global, @val_iisc, @teamIvLabs

Shyamgopal Karthik

@ShyamgopalKart1

Researcher @genmoai PhD from @uni_tue , prev research interns @Snapchat @naverlabseurope , Master's from @iiit_hyderabad

Srinivas Ramasubramanian @stablegradients

6 months ago

Excited to share our NeurIPS paper: “Improving Model-Based Reinforcement Learning by Converging to Flatter Minima”. 🚀 TL;DR: make world-model training seek flatter minima and you get more robust model-based RL, with big gains on challenging benchmarks. 1/n

2

11

2

0

2K

Srinivas Ramasubramanian @stablegradients

6 months ago

If you’re interested in more robust model-based RL or flat-minima training, we’d love feedback and ideas. Paper: https://t.co/d6riDTanjn 10/n

0

3

0

102

Srinivas Ramasubramanian @stablegradients

6 months ago

The method is simple to use: • no architectural changes • small compute overhead (one extra SAM-style step) • works across pixel + state inputs and very different planners. If you already train a world model, this is nearly plug-and-play. 9/n

1

2

0

124

Srinivas Ramasubramanian @stablegradients

7 months ago

I will be at #NeurIPS2025 (Dec 1–7)! 📷 Would love to connect and chat about model-based RL, policy robustness, beyond policy gradient and their implications to LLM. I am actively seeking PhD positions in the aforementioned areas.

0

1

0

163

stablegradients retweeted

Wen-Tse Chen @WenzeChen2

10 months ago

[0/3] 🚀 Introducing Verlog – an open-source RL framework built specifically for training long-horizon, multi-turn LLM agents. 📊 Max episode length comparison: •VeRL / RAGEN → ~10 turns •verl-agent → ~50 turns •Verlog (ours) → 400+ turns 🔥 ⚙️ Technical foundation: •Built on top of the VeRL •Tested on the BALROG benchmark (BabyAI, BabaIsAI, Crafter) •Followed design principles from pytorch-a2c-ppo-acktr-gail 💡 Why Verlog? •For researchers: Skip the heavy engineering. We give you a strong, validated baseline for long-horizon, multi-turn LLM agent across diverse tasks. •For developers: Train on your own long-horizon environments with minimal setup. •Algorithmic edge: With a well-trained value function as an intermediate supervised signal, rollouts can be truncated at any point and still be used for learning. This reduces GPU idle time and boosts training efficiency. This is a genuine advantage of PPO over the GRPO family, widely recognized and leveraged in classic RL, yet often overlooked in LLM agent frameworks. Key features 🧵👇

2

393

68

358

36K

stablegradients retweeted

Fahim Tajwar @FahimTajwar10

about 1 year ago

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

FahimTajwar10's tweet photo. RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers?

Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training!

🧵 1/n https://t.co/7reUZjeTUL

21

826

136

854

87K

stablegradients retweeted

Vision and AI Lab, IISc @val_iisc

over 1 year ago

Vision and AI Lab (VAL), IISc has been recognized as the top AI lab in India by @CSrankings 🥇🎉, reflecting a decade of dedicated research. IISc is also ranked #1 in AI research nationwide 🥇. Thanks to our amazing team for their hard work and commitment🙏 #AI #CV #ML #IISc