atharva @_gundawar - Twitter Profile

Toward general dexterity We are training robot policies that learn a rich and coherent physical representation of the world by conditioning on multimodal observations Here’s a small glimpse of what we’ve been building: Task: Pick the ramen cup and place it in the box Given the current world state, the model generates a multimodal trajectory; here we show the decoded video and the corresponding actions executed on the humanoid

2

24

6

5

2K

atharva

@_gundawar

6 months ago

@Winterice10 This is great! Will this speed up training (and if so, by how much)?

1

0

474

atharva

@_gundawar

8 months ago

3/n first set of stairs!

0

5

0

235

atharva

@_gundawar

8 months ago

Last weekend @diegocaples and I won the grand price of 2 unitree dogs! We had to stress test them, so we took one to union square 🧵1/n

Weights & Biases

@wandb

8 months ago

🏆 Grand Prize Winners: Daydreamer @diegocaples @_gundawar They're tackling the "GPT Moment for Robotics." Their agent uses a video diffusion model to imagine a successful outcome, executes it in the real world, and then uses VLM feedback to self-improve, training only on its successes.

1

10

1

8

1K

1

6

1

576

atharva

@_gundawar

8 months ago

2/n here’s our project daydream, hallucinating a robotic policy https://t.co/mxHr4BlC1k

1

4

0

310

atharva

@_gundawar

12 months ago

It’s been a privilege 🫡 Thank you for letting a random someone who waylaid you be a part of Yochan!

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

12 months ago

🎉🥳Congratulations to Yochanite Atharva Gundawar (@_gundawar) who successfully defended his MS thesis on "Scaling LLMs with LLM-Modulo" today! He is off to The AGI Company (really!)..

rao2z's tweet photo. 🎉🥳Congratulations to Yochanite Atharva Gundawar (@_gundawar) who successfully defended his MS thesis on "Scaling LLMs with LLM-Modulo" today! He is off to The AGI Company (really!)..

1

70

4

15

6K

0

12

0

764

atharva

@_gundawar

about 1 year ago

there’s no range here, not needed tbh

0

2

0

334

atharva

@_gundawar

about 1 year ago

@theyuggupta super underrated ^

0

1

0

24

atharva

@_gundawar

about 1 year ago

can’t ssh into this anymore

0

6

0

371

atharva

@_gundawar

about 1 year ago

@cixliv so what happens if you lift the stationary leg? do you get off balance?

1

0

38

atharva

@_gundawar

over 1 year ago

an agent can dream

0

1

0

44

atharva

@_gundawar

over 1 year ago

@theyuggupta Link?

0

19

atharva

@_gundawar

over 1 year ago

@TheHumanoidHub That’s the security I want if I was a burglar

0

1

0

98

_gundawar retweeted

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

over 1 year ago

On the use of Verifiers with LLMs--External vs. Internal LLM-Modulo (with @kayastechly, @karthikv792 and @21st_Warlock ) We are a bit tickled that verifiers seem to be all the rage on the AI twitter, as we have been advocating use of external verifiers of various hues--hand-written, learned, synthesized, or even LLM-based as part of LLM-Modulo (https://t.co/BglEooq5rF) While LLM-Modulo focused on the use of verifiers in the inference stage, the current spike in the interest on verifiers is of course because of their use in the RL post-training stage of Deepseek R1. In our own group, we are calling this use of verifiers in training (as a way to provide reward signal to the trajectories generated by the base LLM) Internal LLM-Modulo (with the retronym external LLM-Modulo for our original use). The two uses can of course be combined fruitfully, as our work on LRM-Modulo shows (cf https://t.co/RqRf4fWjrU) There are some interesting tradeoffs at play in internal vs. external LLM-Modulo as discussed below: [Verifier use Setup:] Simply put verifiers in the external LLM-Modulo are used to overlay a generate-test cycle on top of the LLM generations by checking whether a generated solution is correct, and if not pushing LLM to generate other alternatives (with optional critiques provided by the verifier). https://t.co/mREKgH8mxk The verifiers in the internal LLM-Modulo do the same job of checking if the generated solution (not the intermediate tokens dubbed "reasoning pattern") is correct, and uses RL to reward the traces of the base LLM leading to verified correct solutions. (c.f. https://t.co/0htnB9tT4S). (While our external LLM-Modulo setup does talk about incrementally collecting synthetic data (Step 7), one important difference is that we were suggesting solution data (to be used for fine tuning) rather than the derivational trace data--which is used by the RL post training in R1. ) [Correctness/Safety:] The RL post-training in internal LLM-Modulo doesn't necessarily guarantee correctness of the solution during inference stage; thus external LLM-Modulo still needs to be used in the inference stage to ensure safety/correctness etc of the solutions output in the inference stage (c.f. https://t.co/uk5ZKimWIM). [Correctness of reasoning patterns vs. solutions:] Note that the internal LLM-Modulo is basically checking the correctness of the final solution rather than the intermediate tokens that have been dubbed reasoning patterns. Thus there are no real guarantees about the external significance of the reasoning patterns & . [Composability of Verifiers:] One of the arguments we make in favor of verifiers as against solvers, in the external LLM-Modulo case, is that you can have a bank of verifiers with each verifier (partially) guaranteeing the correctness of some aspects of the eventual solution. This idea can also potentially be of use in internal LLM-Modulo (i.e., training phase)--with the reward being composed out of the signals from the bank of verifiers. [Verification vs. Critique:] In the external LLM-Modulo, we considered the verifier providing critiques to bias the base LLM's generation of the next candidate, and found it to be useful. In contrast, R1's training phase seems to just let the LLM generate multiple candidates in parallel and use verifiers only to provide reward signals. It would be interesting to see whether critique may help in the sample complexity of training.

rao2z's tweet photo. On the use of Verifiers with LLMs--External vs. Internal LLM-Modulo (with @kayastechly, @karthikv792 and @21st_Warlock )

We are a bit tickled that verifiers seem to be all the rage on the AI twitter, as we have been advocating use of external verifiers of various hues--hand-written, learned, synthesized, or even LLM-based as part of LLM-Modulo (https://t.co/BglEooq5rF)

While LLM-Modulo focused on the use of verifiers in the inference stage, the current spike in the interest on verifiers is of course because of their use in the RL post-training stage of Deepseek R1.

In our own group, we are calling this use of verifiers in training (as a way to provide reward signal to the trajectories generated by the base LLM) Internal LLM-Modulo (with the retronym external LLM-Modulo for our original use). The two uses can of course be combined fruitfully, as our work on LRM-Modulo shows (cf https://t.co/RqRf4fWjrU)

There are some interesting tradeoffs at play in internal vs. external LLM-Modulo as discussed below:

[Verifier use Setup:] Simply put verifiers in the external LLM-Modulo are used to overlay a generate-test cycle on top of the LLM generations by checking whether a generated solution is correct, and if not pushing LLM to generate other alternatives (with optional critiques provided by the verifier). https://t.co/mREKgH8mxk

The verifiers in the internal LLM-Modulo do the same job of checking if the generated solution (not the intermediate tokens dubbed "reasoning pattern") is correct, and uses RL to reward the traces of the base LLM leading to verified correct solutions. (c.f. https://t.co/0htnB9tT4S).

(While our external LLM-Modulo setup does talk about incrementally collecting synthetic data (Step 7), one important difference is that we were suggesting solution data (to be used for fine tuning) rather than the derivational trace data--which is used by the RL post training in R1. )

[Correctness/Safety:] The RL post-training in internal LLM-Modulo doesn't necessarily guarantee correctness of the solution during inference stage; thus external LLM-Modulo still needs to be used in the inference stage to ensure safety/correctness etc of the solutions output in the inference stage (c.f. https://t.co/uk5ZKimWIM).

[Correctness of reasoning patterns vs. solutions:] Note that the internal LLM-Modulo is basically checking the correctness of the final solution rather than the intermediate tokens that have been dubbed reasoning patterns. Thus there are no real guarantees about the external significance of the reasoning patterns & .

[Composability of Verifiers:] One of the arguments we make in favor of verifiers as against solvers, in the external LLM-Modulo case, is that you can have a bank of verifiers with each verifier (partially) guaranteeing the correctness of some aspects of the eventual solution. This idea can also potentially be of use in internal LLM-Modulo (i.e., training phase)--with the reward being composed out of the signals from the bank of verifiers.

[Verification vs. Critique:] In the external LLM-Modulo, we considered the verifier providing critiques to bias the base LLM's generation of the next candidate, and found it to be useful. In contrast, R1's training phase seems to just let the LLM generate multiple candidates in parallel and use verifiers only to provide reward signals. It would be interesting to see whether critique may help in the sample complexity of training.

5

46

7

51

9K

atharva

@_gundawar

almost 2 years ago

@Scobleizer @alanmelling @Jandodev Another benefit is lower cost, these promts are generally much smaller than if you were to write it out in human readable format. (From a conv I had with a founding member)

1

0

38

atharva

@_gundawar

almost 2 years ago

Using the generate-critic framework LLM Modulo, we were able to increase the accuracy of GPT 4 Turbo on the travel planning benchmark by 5X. We are working on improving these results, stay tuned!

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

almost 2 years ago

📢 Next Tuesday 7/23 afternoon, I will be presenting our #ICML2024 spotlight poster "Position: LLM's Can't Plan, But Can Help Planning in LLM-Modulo Frameworks" (Hall C 4-9 #710). @icmlconf poster page: https://t.co/F5zIdCte0Z

rao2z's tweet photo. 📢 Next Tuesday 7/23 afternoon, I will be presenting our #ICML2024 spotlight poster "Position: LLM's Can't Plan, But Can Help Planning in LLM-Modulo Frameworks" (Hall C 4-9 #710).

@icmlconf poster page: https://t.co/F5zIdCte0Z

6

32

3

14

9K

0

1

0

288

atharva

@_gundawar

almost 2 years ago

@AndrewPierno Send! Thanks Andrew!

0

7

atharva

@_gundawar

Last Seen Users on Sotwe

Trends for you

Most Popular Users