David Tao @Taodav - Twitter Profile

about 1 month ago

@yacineMTB Sounds about right for JAX! Thought about going smaller than 64x64, but things get hard to identify past 32x32

0

68

David Tao

@Taodav

about 1 month ago

Reinforcement learning on visual first-person environments is costly: rendering engines are expensive! Enter JAXenstein: a lightning fast benchmark of first-person environments based on a pure JAX reimplementation of the Wolfenstein 3D rendering engine. (1/n)

5

84

16

35

6K

David Tao

@Taodav

about 1 month ago

You can check out (and contribute to!) the framework here: https://t.co/L7QnWf1Hu5 For more details, our preprint is here: https://t.co/RGjalFqwHc (n/n)

0

5

0

1

267

David Tao

@Taodav

about 1 month ago

Will it run DOOM? Well that’s a weird question. But I guess we’ll be adding full DOOM functionality for JAX in the near future. (4/n)

1

8

0

284

Who to follow

Sacha Morin

@SachMorin

PhD student at Université de Montréal and @Mila_Quebec. Embodied AI, Robotics, 3D perception.

David Dobre

@busycalibrating

PhD in LLM robustness and alignment @Mila_Quebec. Likes mountains.

Naga Karthik

@naga_karthik7

Postdoc @UHN and @VectorInst | Working on deploying ML models for healthcare

Taodav retweeted

Lakshita Dodeja

@lakshitadodeja

about 2 months ago

Can BC policies be quickly improved through real world experience? Our new #RSS2026 paper proposes Q2RL, a method that bridges BC and RL for on-robot learning. Q2RL improves BC policies by up to 3.75x with just 1-2 hours of online interaction! So when life gives you BC, make Q-functions! 🍋 Details in thread 🧵

7

197

45

124

36K

Taodav retweeted

Dan Haramati @DanHrmti

5 months ago

Learning accurate World Models for long horizon planning is hard. So what minimal aspect of world dynamics must a model capture to achieve complex goals? We find a simple and effective solution in our #ICLR2026 paper, which we will present as an Oral at @worldmodel_26. (1/n)

6

289

47

212

30K

Taodav retweeted

Patrick @dramaticirony

8 months ago

the scariest thing of all, a disappointing romantic and academic life

97

13K

1K

2K

1M

Taodav retweeted

Elai @elaifresh

9 months ago

We must become more Chinese

144

93K

10K

21K

4M

David Tao

@Taodav

10 months ago

That being said, the current model at top tier conferences is unsustainable too. I’m not sure what the correct answer is, but we shouldn’t ignore visibility.

0

1

0

198

David Tao

@Taodav

10 months ago

I’ve heard similar takes before and people like to bring up the incentive systems behind publishing at top tier conferences (jobs, positions etc.). I would argue that this often ignores one of the biggest up sides of submitting to a top tier conference: visibility. This was one of the dangers of the RL community starting our own conference as well. We lose visibility from the wider ML community, which I still think is very important.

Hieu Pham

@hyhieu226

10 months ago

AI/ML publication venues are broken beyond fixable. I genuinely believe the only way to fix them is to completely devalue them (best to do that immediately, but perhaps slowly overtime since people have inertia). Then, start something new that encourages quality over quantity.

8

108

5

23

35K

1

2

0

527

Taodav retweeted

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

over 2 years ago

The correct answer to "what online RL algo should you use" has always been and will always be "whatever you know how to tune the hyper parameters for best"

rajammanabrolu's tweet photo. The correct answer to "what online RL algo should you use" has always been and will always be "whatever you know how to tune the hyper parameters for best" https://t.co/5iFhEf6Ehw

2

57

4

27

8K

David Tao

@Taodav

11 months ago

@MAghajohari Yes! Currently working on something to help stabilize PPO with LLMs :)

0

1

0

1

54

David Tao

@Taodav

11 months ago

THIS! Working with LLM folk and there seems to be a deep misunderstanding of how reinforcement learning works. I suspect it’s because of the simplified Monte Carlo algorithms (like GRPO) that have become so prevalent, where credit assignment over time isn’t even under consideration.

Khurram Javed

@kjaved_

12 months ago

The issue in the first paragraph is real when learning without bootstrapping (e.g., with reinforce). TD learning methods can already learn along the way and figure out what went well and what didn't if the value function has a good understanding of the world. This works even if rewards are delayed by hours. Adding planning updates to the mix allows agents to reason about actions that it did not take and could try in the future.

2

133

7

103

23K

2

17

0

5

2K

David Tao

@Taodav

11 months ago

@alperahmetoglu HAHAHAHA this is hilarious

0

28

David Tao

@Taodav

11 months ago

A huge shoutout to my co-authors @KaichengGuo27, @camall3n and George Konidaris. POBAX is available on Github. If you’re curious to learn more, check out our paper (https://t.co/G0lD2wwwhI) or come chat with us at RLC 2025! We’ll be presenting this at the Track 4: Evaluation, Benchmarks session on August 6th. Come say hi! 🧵5/5

3

13

1

569

David Tao

@Taodav

11 months ago

What does it mean to be “better at” partial observability in RL? Existing benchmarks don't always provide a clear signal for progress. We fix that. Our new work (at RLC 2025 🤖) introduces a new property that ensures your gains are from learning better memory vs other factors. AND we provide a new JAX benchmark with environments that all have this property! 🧵1/5

Taodav's tweet photo. What does it mean to be “better at” partial observability in RL? Existing benchmarks don't always provide a clear signal for progress. We fix that.
Our new work (at RLC 2025 🤖) introduces a new property that ensures your gains are from learning better memory vs other factors. AND we provide a new JAX benchmark with environments that all have this property!

🧵1/5

5

154

23

97

12K

David Tao

@Taodav

11 months ago

We introduce POBAX: an open-source benchmark on partial observability that includes a diverse range of memory-improvable environments. POBAX is entirely written in JAX for extremely fast, GPU-scalable hyperparameter sweeping and experimentation. 🧵4/5

Taodav's tweet photo. We introduce POBAX: an open-source benchmark on partial observability that includes a diverse range of memory-improvable environments. POBAX is entirely written in JAX for extremely fast, GPU-scalable hyperparameter sweeping and experimentation.

🧵4/5 https://t.co/ZmUwAczot4

1

7

0

514

David Tao

@Taodav

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users