Andy Tang @tangerinecoder - Twitter Profile

about 1 month ago

Deploying language models in scientific discovery domains requires extraordinary amounts of test-time compute for search algorithms. An ideal training algorithm should be designed with this goal in mind - that we want agents to learn how to not only exploit but also optimistically explore novel strategies. The agent should learn how to synergistically explore and exploit. We propose Poly-EPO, a set RL algorithm that explores and discovers diverse reasoning paths. Work with @jubayer_hamid (co-lead), Shreya, @ShirleyYXWu, @HengyuanH, @noahdgoodman, @DorsaSadigh, and @chelseabfinn.

ifdita_hasan's tweet photo. Deploying language models in scientific discovery domains requires extraordinary amounts of test-time compute for search algorithms. An ideal training algorithm should be designed with this goal in mind - that we want agents to learn how to not only exploit but also optimistically explore novel strategies. The agent should learn how to synergistically explore and exploit.

We propose Poly-EPO, a set RL algorithm that explores and discovers diverse reasoning paths. Work with @jubayer_hamid (co-lead), Shreya, @ShirleyYXWu, @HengyuanH, @noahdgoodman, @DorsaSadigh, and @chelseabfinn.

3

107

22

91

52K

tangerinecoder retweeted

Yoonho Lee

@yoonholeee

2 months ago

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

yoonholeee's tweet photo. How can we autonomously improve LLM harnesses on problems humans are actively working on?

Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores.

Announcing Meta-Harness: a method for optimizing harnesses end-to-end

78

2K

283

2K

590K

tangerinecoder retweeted

Will Chen @verityw_

4 months ago

How can robot policies be trained to best leverage VLMs' CoT reasoning and in-context learning for generalization? The key is Steerable Policies: vision-language-action models that can be flexibly controlled in many ways! https://t.co/GvcvmY0JD5 1/9

verityw_'s tweet photo. How can robot policies be trained to best leverage VLMs' CoT reasoning and in-context learning for generalization?
The key is Steerable Policies: vision-language-action models that can be flexibly controlled in many ways!
https://t.co/GvcvmY0JD5
1/9 https://t.co/K86U0azgyA

6

139

37

76

23K

tangerinecoder retweeted

Jubayer Ibn Hamid

@jubayer_hamid

8 months ago

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks where strategic exploration is necessary. We introduce a framework for training a policy over sets of generations and use it to induce exploration. Work with @ifdita_hasan (co-lead), @ellenjxu_ , @chelseabfinn and @DorsaSadigh at Stanford 🧵

jubayer_hamid's tweet photo. Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks where strategic exploration is necessary. We introduce a framework for training a policy over sets of generations and use it to induce exploration.

Work with @ifdita_hasan (co-lead), @ellenjxu_ , @chelseabfinn and @DorsaSadigh at Stanford 🧵

18

1K

138

886

198K

tangerinecoder retweeted

Marcel Torné @marceltornev

11 months ago

Very happy to share that our work on learning long-history policies received the Best Paper Award from the Workshop on Learned Robot Representations @RoboticsSciSys ! 🤖🥳 Check out our paper if you haven't already! https://t.co/Wfbz58lF8D Thank you to all the organizers and the amazing collaborators @tangerinecoder, @liu_yuejiang and @chelseabfinn!

marceltornev's tweet photo. Very happy to share that our work on learning long-history policies received the Best Paper Award from the Workshop on Learned Robot Representations @RoboticsSciSys ! 🤖🥳

Check out our paper if you haven't already! https://t.co/Wfbz58lF8D

Thank you to all the organizers and the amazing collaborators @tangerinecoder, @liu_yuejiang and @chelseabfinn!

2

85

11

15

11K

tangerinecoder retweeted

Shirley Wu

@ShirleyYXWu

12 months ago

Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦‍♀️ Ask to write articles → assumes your preferences 🤷🏻‍♀️ ⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators. Website: https://t.co/Aq654MbyTL Github: https://t.co/wlP8eByqSA Blog: https://t.co/gBNJojNY5O Paper: https://t.co/SfHH6ruqsS 🎯 Key insight: Rewards responses not by immediate helpfulness, but by their long-term impact on the conversation trajectory. @MSFTResearch @StanfordAILab @stanfordnlp

ShirleyYXWu's tweet photo. Even the smartest LLMs can fail at basic multiturn communication

Ask for grocery help → without asking where you live 🤦‍♀️
Ask to write articles → assumes your preferences 🤷🏻‍♀️

⭐️CollabLLM (top 1%; oral @icmlconf) transforms LLMs from passive responders into active collaborators.

Website: https://t.co/Aq654MbyTL
Github: https://t.co/wlP8eByqSA
Blog: https://t.co/gBNJojNY5O
Paper: https://t.co/SfHH6ruqsS

🎯 Key insight: Rewards responses not by immediate helpfulness, but by their long-term impact on the conversation trajectory.

@MSFTResearch @StanfordAILab @stanfordnlp

9

209

60

109

73K

tangerinecoder retweeted

Annie Chen @_anniechen_

about 1 year ago

How can robots autonomously handle ambiguous situations that require commonsense reasoning? *VLM-PC* provides adaptive high-level planning, so robots can get unstuck by exploring multiple strategies. Paper: https://t.co/UmR6raIPiW

1

92

18

30

24K

tangerinecoder retweeted

Chelsea Finn

@chelseabfinn

about 1 year ago

How do we make a scalable RL recipe for robots? We study batch online RL w/ demos. Key findings: - iterative filtered imitation is insufficient - need diverse policy data, eg using diffusion policy - policy extraction can hinder data diversity Paper: https://t.co/LsNtv4cRkU

3

168

24

103

23K

tangerinecoder retweeted

Yuejiang Liu @liu_yuejiang

about 1 year ago

🧠Memory is crucial for robots — to handle occlusions, track progress, stay coherent, etc. Yet, most VLA truncate context. 🤔Why is long-context hard for robot policies? And how can we fix it? 📄Our new paper: Learning Long-Context Diffusion Policies via Past-Token Prediction

2

46

10

24

4K

Andy Tang @tangerinecoder

about 1 year ago

Was super fun exploring this! Most modern policies don't use history -- Diffusion Policy in particular gets a lot worse. We identify a simple ingredient for history improvement, and use it to improve efficiency and performance of long-context policies.

Marcel Torné @marceltornev

about 1 year ago

Giving history to our robot policies is crucial to solve a variety of daily tasks. However, diffusion policies get worse when adding history. 🤖 In our recent work we learn how adding an auxiliary loss that we name Past-Token Prediction (PTP) together with cached embeddings enables us to reliably add longer history context to our robot policies! 🧠 We also show how PTP enables some test-time scaling techniques for robotics! 🚀

7

240

40

127

62K

0

4

2

1

686

Andy Tang @tangerinecoder

over 1 year ago

@sanjehorah i have an ad-hoc version in LaTeX annotated with date added/finished, stage (to read, annotate, done), priority, and stream (from Twitter, robotics, neuro, etc.) -- lmk if ppl get together to build as i have opinions

0

2

0

178

tangerinecoder retweeted

Karan Dalal

@karansdalal

almost 2 years ago

I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses context through actual gradient descent on input tokens. We call our method “Test-Time-Training layers.” TTT layers directly replace attention, and unlock linear complexity architectures with expressive memory, allowing us to train LLMs with millions (someday billions) of tokens in context. Our instantiations, TTT-Linear and TTT-MLP, both match or beat the strongest Transformers and Mamba. Arxiv: https://t.co/3eEenKB17s

karansdalal's tweet photo. I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models.

We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses context through actual gradient descent on input tokens. We call our method “Test-Time-Training layers.”

TTT layers directly replace attention, and unlock linear complexity architectures with expressive memory, allowing us to train LLMs with millions (someday billions) of tokens in context.

Our instantiations, TTT-Linear and TTT-MLP, both match or beat the strongest Transformers and Mamba. Arxiv: https://t.co/3eEenKB17s

49

2K

279

2K

429K

Andy Tang @tangerinecoder

about 2 years ago

@adriana0nline CONGRATSSS!!!

0

1

0

95

Andy Tang @tangerinecoder

about 2 years ago

@TheBookie0 seems like they might be in need of a designer ;)

0

1

0

37

Andy Tang @tangerinecoder

about 2 years ago

@giansegato do this as well! don't have a voice recorder but turn off wifi/cellular, open voice memos, go pace around a basement or run

1

0

250

Andy Tang @tangerinecoder

about 2 years ago

@TheBookie0 lfggggggggggggg

0

1

0

54

tangerinecoder retweeted

Sergiy Nesterenko

@sergiynest

over 2 years ago

We've had over a thousand new engineers try Quilter in the last few weeks submitting some really interesting designs. We really want to see some of these come to life, so we're subsidizing board builds! If you want to build a Quilter design in real life, we'll cover the cost of the PCB! More about this in the link. Open hardware community: this one is especially for you ;)

0

7

4

5

6K

tangerinecoder retweeted

Patrick @_patrickhult

over 2 years ago

Here is Playground v2.5, our latest model.

0

25

1

2

2K

tangerinecoder retweeted

Annie Chen @_anniechen_

over 2 years ago

Very excited to introduce ROAM, our new work that allows a robot to *adapt on-the-go* as it faces OOD situations during deployment, drawing on pre-trained behaviors. See as ROAM enables our Go1 to roller skate zero-shot 🤖🐕🛼 (without any lessons!) 🧵(1/9)

6

140

24

43

66K

tangerinecoder retweeted

Replit ⠕

@Replit

over 2 years ago

We’ve had a flurry of product launches over the past week. Unless you’ve been on X every day, you likely missed a couple. Here’s a recap of every launch so you can get up to speed👇

1

60

11

14

19K

Andy Tang

@tangerinecoder

Last Seen Users on Sotwe

Trends for you

Most Popular Users