Chelsea Finn @Chelseabfinn - Twitter Profile

Pinned Tweet

2 months ago

LLM post-training used to mean fine-tuning to a downstream task Robotics has been stuck in this setting, needing task-specific fine-tuning for best performance π07 changes this: It works out of the box & outperforms fine-tuned specialists Details: https://t.co/QbO3E4D3QN

21

580

61

208

61K

Chelsea Finn

@chelseabfinn

about 2 hours ago

LLM RL optimizes for sequential reasoning We also optimize over the reasoning strategy, incl parallel trains of thought, aggregation of parallel traces, & sequential reasoning This allows the model to better explore & allocate compute at test time https://t.co/DkTSllkmvp

chelseabfinn's tweet photo. LLM RL optimizes for sequential reasoning

We also optimize over the reasoning strategy, incl parallel trains of thought, aggregation of parallel traces, & sequential reasoning

This allows the model to better explore & allocate compute at test time

https://t.co/DkTSllkmvp

Jubayer Ibn Hamid

@jubayer_hamid

3 days ago

The most capable reasoning systems in AI scale inference compute along several axes: sequential compute to think longer, parallel compute to sample many independent attempts, and aggregative compute to synthesize prior traces into a new improved one. But during training, we only optimize how models use sequential compute. This creates a fundamental mismatch between how we ultimately deploy these systems and how we train them, leaving much of search and synthesis unoptimized. We introduce SPIRAL, an RL framework for making all inference-compute primitives end-to-end learnable: models learn to coordinate sequential, parallel, and aggregative reasoning using only the reward of the final output. Work with @ifdita_hasan (co-lead), @michaelyli_ , @oshaikh13 , @yoonholeee , @DorsaSadigh , @chelseabfinn , @noahdgoodman 🧵

jubayer_hamid's tweet photo. The most capable reasoning systems in AI scale inference compute along several axes: sequential compute to think longer, parallel compute to sample many independent attempts, and aggregative compute to synthesize prior traces into a new improved one. But during training, we only optimize how models use sequential compute. This creates a fundamental mismatch between how we ultimately deploy these systems and how we train them, leaving much of search and synthesis unoptimized.

We introduce SPIRAL, an RL framework for making all inference-compute primitives end-to-end learnable: models learn to coordinate sequential, parallel, and aggregative reasoning using only the reward of the final output. Work with @ifdita_hasan (co-lead), @michaelyli_ , @oshaikh13 , @yoonholeee , @DorsaSadigh , @chelseabfinn , @noahdgoodman 🧵

14

368

81

430

123K

1

38

8

30

4K

Chelsea Finn

@chelseabfinn

about 3 hours ago

Can we translate a rough sense for what to do + VLA prior into successful behavior? Flow reversal steering: 1) runs the VLA's flow ODE backwards to back out noise that's closest to coarse traj 2) runs flow forwards to get closest good behavior Paper: https://t.co/FUDqDEpeJq

Andy Tang @tangerinecoder

14 days ago

Generalist robot policies learn many useful skills, but struggle to select good behaviors for new tasks. To solve this, we introduce Flow Reversal Steering (FRS), a method to refine coarse semantic guidance into precise, in-distribution motions. https://t.co/uCR6KmoDo8 1/N

3

54

10

46

13K

3

44

2

38

6K

Chelsea Finn

@chelseabfinn

14 days ago

How does test-time scaling impact robots? We find that larger models, more thinking, and more context help significantly for some prompts but not others. Like LLMs, we can also train a router to for a better performance/latency tradeoff! Paper: https://t.co/HEjjCkrsen

Jadelynn @_jadelynn

15 days ago · Stanford

test-time compute [ttc] in robotics isn't free & isn't always worth it. smart allocation of ttc recovers frontier-level planning at a fraction of the cost! coauthor @milanganai w/ Yasmina @ajaysridhar0 Mozghan @katielulula Clark Barrett @jiajunwu_cs @chelseabfinn @drmapavone 🧵

3

64

14

34

40K

2

186

19

116

23K

Who to follow

Lilian Weng

@lilianweng

Co-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log

Berkeley AI Research

@berkeley_ai

We're graduate students, postdocs, faculty and scientists at the cutting edge of artificial intelligence research.

Sergey Levine

@svlevine

Associate Professor at UC Berkeley Co-founder, Physical Intelligence

Chelsea Finn

@chelseabfinn

14 days ago

Can robot foundation models collaborate with themself? We finetune a VLA to be able to control any robot in a team. - matches or outperforms training separate models or a single centralized model for all robots - readily scales to large teams Paper: https://t.co/yqdSOQ7ead

Ria Doshi

@riadoshi21

15 days ago

🤔 Can we train one VLA policy to control multi-robot teams without any explicit communication? ✨ Introducing CHORUS: a single policy for decentralized, multi-embodiment collaboration 🧵⬇️

3

213

40

83

55K

5

252

24

128

24K

chelseabfinn retweeted

Ji Woong Kim

@jwbkim

22 days ago

We show that robots can learn high-level task semantics, such as sorting rules, skill composition, and rule-based ordering, directly from human demos. This is useful because if your target task is a composition of the robot's existing skills, you could just collect human demos for it without collecting further robot data. Introducing Ego-Pi: VLA fine-tuning for egocentric human and robot data, a collaboration between @Stanford and @Meta. Website: https://t.co/dIF6n4QGy3 Paper: https://t.co/3GFk6KQw9P 1/6

8

113

32

70

23K

Chelsea Finn

@chelseabfinn

20 days ago

Scaling RL to long horizons remains a major challenge. Long-horizon Q-learning (LQL) prevents compounding bootstrapping errors by bounding the difference in value over long horizons. It shows large gains over 1-step TD and n-step returns! Paper: https://t.co/OTk3M6cz8p

chelseabfinn's tweet photo. Scaling RL to long horizons remains a major challenge.

Long-horizon Q-learning (LQL) prevents compounding bootstrapping errors by bounding the difference in value over long horizons.

It shows large gains over 1-step TD and n-step returns!

Paper: https://t.co/OTk3M6cz8p https://t.co/kwOGH4algI

Armaan Abraham @armaanabraham

about 2 months ago

In RL, what if we could learn from any experience from any policy in a way that is reliable and scalable? This would be helpful in domains like robotics where new data is expensive. We introduce Long-horizon Q-learning (LQL) to tackle this https://t.co/1Ckb5ZePyo.

1

96

14

78

73K

7

495

50

388

59K

chelseabfinn retweeted

Yuejiang Liu @liu_yuejiang

24 days ago

Excited to share that I’ll join @NUSComputing as an Assistant Professor in 2027 🏛️ I’ll build LEMA Lab: https://t.co/qzKA7dpA00, study the principles of embodied intelligence, & empower every lab member to thrive 📢 Recruiting 3-6 PhD students in the next application cycles

liu_yuejiang's tweet photo. Excited to share that I’ll join @NUSComputing as an Assistant Professor in 2027

🏛️ I’ll build LEMA Lab: https://t.co/qzKA7dpA00, study the principles of embodied intelligence, & empower every lab member to thrive

📢 Recruiting 3-6 PhD students in the next application cycles https://t.co/lqy5QLIT0e

29

184

30

59

46K

Chelsea Finn

@chelseabfinn

about 1 month ago

Project led by @perryadong and @khhung906, with @TianGao_19, @DorsaSadigh @StanfordAILab Check out the paper and website for many more details and cool robot videos! 🤖 https://t.co/54nO9tFU0Z https://t.co/AdphIBvR9e

1

6

1

0

2K

Chelsea Finn

@chelseabfinn

about 1 month ago

How can VLAs achieve 95+% reliability? Using RL post-training with EXPO-FT: - π0.5 improves to 30/30 success on all 8 tasks tested - uses only 19 min of RL data on average Paper & videos: https://t.co/54nO9tFU0Z

Perry Dong @perryadong

about 1 month ago

Introducing EXPO-FT – Efficient, Reliable & Open-Source VLA Finetuning! EXPO-FT unlocks π0.5 for challenging manipulation tasks: Routing string lights & inserting the power connector to illuminate them Striking pool ball into pocket Inserting flower into wine bottle (1/5)

13

236

32

154

76K

5

314

37

257

40K

Chelsea Finn

@chelseabfinn

about 1 month ago

EXPO-FT extends EXPO to fine-tune VLAs in the real world, using image observations, action chunking, and DAgger data. Compared to past methods, EXPO-FT - reaches higher reliability with less data - handles wider set of initial states

chelseabfinn's tweet photo. EXPO-FT extends EXPO to fine-tune VLAs in the real world, using image observations, action chunking, and DAgger data.

Compared to past methods, EXPO-FT
- reaches higher reliability with less data
- handles wider set of initial states https://t.co/jy5ANkyg9W

1

3

1

2K

Chelsea Finn

@chelseabfinn

2 months ago

I’m giving a talk tomorrow (Monday) at ICLR on long-term memory for long-term autonomy. 9:25 am @ MemAgents workshop https://t.co/gjBG8j48pQ

10

278

20

111

20K

Chelsea Finn

@chelseabfinn

2 months ago

I'm giving three workshop talks tomorrow (Sunday) at ICLR! - self-improvement w/o reward bottleneck, incl meta-harness (10:30 am, RSI workshop) - emergent physical generalization, incl π0.7 (11:40 am, multimodal workshop) - RL for robustness (1:45 pm, CAO workshop)

7

255

13

87

21K

Chelsea Finn

@chelseabfinn

2 months ago

Can LLMs generate new insights that build on prior research? GiantsBench is a new scientific discovery benchmark, that tests whether models can synthesize new insights given two parent papers. Paper + data + code: https://t.co/25q0F2jhpi

chelseabfinn's tweet photo. Can LLMs generate new insights that build on prior research?

GiantsBench is a new scientific discovery benchmark, that tests whether models can synthesize new insights given two parent papers.

Paper + data + code: https://t.co/25q0F2jhpi https://t.co/Gjqf3azGBW

Joy He-Yueya @JoyHeYueya

2 months ago

Scientists often make breakthroughs by synthesizing ideas across papers. In our new paper, we ask whether a language model can anticipate this process: given two parent papers, can it generate the core insight of a future paper built on them? 🧵⬇️

18

730

91

561

186K

19

748

115

619

110K

Chelsea Finn

@chelseabfinn

2 months ago

RL fine-tuning often prematurely collapses LLM entropy. Poly-EPO is a scalable set-RL algorithm that optimizes for a set of accurate solutions with diverse reasoning strategies. Paper: https://t.co/0HVe8YHr56

Ifdita Hasan

@ifdita_hasan

2 months ago

Deploying language models in scientific discovery domains requires extraordinary amounts of test-time compute for search algorithms. An ideal training algorithm should be designed with this goal in mind - that we want agents to learn how to not only exploit but also optimistically explore novel strategies. The agent should learn how to synergistically explore and exploit. We propose Poly-EPO, a set RL algorithm that explores and discovers diverse reasoning paths. Work with @jubayer_hamid (co-lead), Shreya, @ShirleyYXWu, @HengyuanH, @noahdgoodman, @DorsaSadigh, and @chelseabfinn.

ifdita_hasan's tweet photo. Deploying language models in scientific discovery domains requires extraordinary amounts of test-time compute for search algorithms. An ideal training algorithm should be designed with this goal in mind - that we want agents to learn how to not only exploit but also optimistically explore novel strategies. The agent should learn how to synergistically explore and exploit.

We propose Poly-EPO, a set RL algorithm that explores and discovers diverse reasoning paths. Work with @jubayer_hamid (co-lead), Shreya, @ShirleyYXWu, @HengyuanH, @noahdgoodman, @DorsaSadigh, and @chelseabfinn.

3

109

22

93

53K

4

397

60

297

51K

Chelsea Finn

@chelseabfinn

2 months ago

FASTER makes top diffusion RL algos (eg IDQL, EXPO) computationally cheaper while retaining performance Key idea: a denoising critic that operates on noise, enabling best-of-N filtering before diffusing Paper + code: https://t.co/ylVWZxu4L9

Perry Dong @perryadong

2 months ago

Top RL algorithms today are getting powerful — but they can be prohibitively expensive, relying on test-time scaling techniques such as best-of-N sampling We propose FASTER, a method that maintains the performance gains while eliminating the computational costs (1/6)

perryadong's tweet photo. Top RL algorithms today are getting powerful — but they can be prohibitively expensive, relying on test-time scaling techniques such as best-of-N sampling

We propose FASTER, a method that maintains the performance gains while eliminating the computational costs

(1/6) https://t.co/ySoCnvngW1

3

245

30

198

43K

0

180

19

123

25K

chelseabfinn retweeted

Allen Ren

@allenzren

2 months ago

I remember sitting down at my desk for the first time, @QuanVng showing me the starter project: let’s make pre-trained models work without fine-tuning? With π0.7, our pre-trained model works out of the box across so many tasks, matching or even outperforming SFT or RL specialists! https://t.co/OXxvem0FT0

3

134

19

24

14K

chelseabfinn retweeted

Lucy Shi @lucy_x_shi

2 months ago

1/ We just released π0.7 — a steerable generalist robot model with emergent capabilities. I want to share a bit of the backstory, because π0.7 taught me something surprising about where robot learning is heading. A thread on bittersweet lessons 🧵