Weiting (Steven) Tan @weiting_nlp - Twitter Profile

Pinned Tweet

9 months ago

I was curious about the voice agent, particularly its ability for tool-use and reasoning, while listening to a user. So we train voice agents by building a sandbox (based on tau-bench) that uses GPT-4.1 for user simulation and SEED-TTS for speech synthesis. #Agents #ToolUse

weiting_nlp's tweet photo. I was curious about the voice agent, particularly its ability for tool-use and reasoning, while listening to a user.

So we train voice agents by building a sandbox (based on tau-bench) that uses GPT-4.1 for user simulation and SEED-TTS for speech synthesis.
#Agents #ToolUse https://t.co/MbJS7Fb8i4

1

9

3

4

1K

Weiting (Steven) Tan @weiting_nlp

8 months ago

@jackjingyuzhang @AmazonScience congrats!!

0

1

0

46

weiting_nlp retweeted

Jubayer Ibn Hamid

@jubayer_hamid

9 months ago

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks where strategic exploration is necessary. We introduce a framework for training a policy over sets of generations and use it to induce exploration. Work with @ifdita_hasan (co-lead), @ellenjxu_ , @chelseabfinn and @DorsaSadigh at Stanford 🧵

jubayer_hamid's tweet photo. Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks where strategic exploration is necessary. We introduce a framework for training a policy over sets of generations and use it to induce exploration.

Work with @ifdita_hasan (co-lead), @ellenjxu_ , @chelseabfinn and @DorsaSadigh at Stanford 🧵

18

1K

139

888

199K

weiting_nlp retweeted

Sanxing Chen @sanxing_chen

9 months ago

Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So, can RL & SFT teach LLMs a meta-bandit policy to explore in-context? 🤔 The regret-based benchmarks screamed YES! But … real story is more complex. We discovered a surprising phenomenon “When Greedy Wins.” (1/5) 🧵

sanxing_chen's tweet photo. Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So, can RL & SFT teach LLMs a meta-bandit policy to explore in-context? 🤔

The regret-based benchmarks screamed YES! But … real story is more complex. We discovered a surprising phenomenon “When Greedy Wins.”
(1/5) 🧵

1

30

13

10

7K

Who to follow

Neha Verma

@n_verma1

PhD student @jhuclsp. Previously @yale, intern @AIatMeta, intern @Google+@GoogleDeepmind | Efficient models, merging, MT

CLS

@ChengleiSi

Founding member @Recursive_SI & PhD @stanfordnlp | teaching language models to do research

Boyuan Zheng

@boyuan__zheng

Computer Use Agents @xAI | Prev: @allen_ai, @osunlp, @jhuclsp Views Are My Own

Weiting (Steven) Tan @weiting_nlp

9 months ago

This research was made possible by my fantastic collaborators and mentors at @jhuclsp and Bytedance Seed Speech: @XinghuaQu, @tuming628, Meng Ge, Andy T. Liu, Philipp Koehn, Lu Lu. Paper: https://t.co/MhFcBSA0Ge Code and data will be released shortly.

0

2

1

0

162

Weiting (Steven) Tan @weiting_nlp

9 months ago

Training multimodal voice agents is even tougher. To get them up to speed, we designed a two-part strategy: 1️⃣ A "warm-up" curriculum on simplified tasks to build core tool-calling skills. 2️⃣ Mixed-modality training with interleaved speech-text rollouts

weiting_nlp's tweet photo. Training multimodal voice agents is even tougher. To get them up to speed, we designed a two-part strategy:
1️⃣ A "warm-up" curriculum on simplified tasks to build core tool-calling skills.
2️⃣ Mixed-modality training with interleaved speech-text rollouts https://t.co/WckYABhupo

1

0

170

Weiting (Steven) Tan @weiting_nlp

9 months ago

We also explored several other strategies, such as entropy-related changes in the loss function, forcing self-reflection during rollout, and more fine-grained reward assignment with PPO. However, they do not work well as intended. Please check our analysis section for details.

weiting_nlp's tweet photo. We also explored several other strategies, such as entropy-related changes in the loss function, forcing self-reflection during rollout, and more fine-grained reward assignment with PPO. However, they do not work well as intended. Please check our analysis section for details. https://t.co/lrGJJ7O9ix

1

0

130

Weiting (Steven) Tan @weiting_nlp

9 months ago

Vanilla RL struggles with exploration & credit assignment. We tackle this with: 1️⃣ Mixed-task training (w/ math) to keep the agent curious. 2️⃣ Turn-level Adjudicated RL (TARL), which uses an LLM-judge for precise turn-level feedback.

weiting_nlp's tweet photo. Vanilla RL struggles with exploration & credit assignment. We tackle this with:
1️⃣ Mixed-task training (w/ math) to keep the agent curious.
2️⃣ Turn-level Adjudicated RL (TARL), which uses an LLM-judge for precise turn-level feedback. https://t.co/vQmYXlYoTH

1

0

101

Weiting (Steven) Tan @weiting_nlp

9 months ago

I was curious about the voice agent, particularly its ability for tool-use and reasoning, while listening to a user. So we train voice agents by building a sandbox (based on tau-bench) that uses GPT-4.1 for user simulation and SEED-TTS for speech synthesis. #Agents #ToolUse

1

9

3

4

1K

weiting_nlp retweeted

Jacob Austin @jacobaustin132

over 1 year ago

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

jacobaustin132's tweet photo. Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n https://t.co/jnb5kTLD5V

25

2K

392

3K

466K

weiting_nlp retweeted

Dongfu Jiang

@DongfuJiang

10 months ago

🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of tools (including multimodal ones) such as code interpreter, FAISS retriever, Google Search, Bash terminal, SQL executor, image processing, SWE, and more. For each tool, we provide training recipes and detailed analysis, with all code designed to be reproducible and runnable on a single node. A key design choice is the separation of the RL workflow and the tool server. Every trajectory sends tool calls via a well-designed API interface after encountering an action stop token. The tool server handles requests with either multi-threading or Ray, ensuring high concurrency and stable resource management—for example, our math experiments run stably past 1k steps. Our goal with VerlTool is to make it easy for the community to add new tools in ARLT training. Developers only need to inherit from BaseTool and adapt minimal code. In fact, you could even give the BaseTool file to GPT/Claude and get almost plug-and-play code. We also explored important technical issues in Agentic RL, such as how much async rollouts can actually speed things up, or how tool response tokenization may cause off-policy drift. We hope these insights, while modest, can be useful for the community. 📄 HuggingFace Daily Paper: https://t.co/OkKNanXufu 🛠️Github: https://t.co/gmhic1BVPU More details: (0/5)👇

DongfuJiang's tweet photo. 🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May!

VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of tools (including multimodal ones) such as code interpreter, FAISS retriever, Google Search, Bash terminal, SQL executor, image processing, SWE, and more. For each tool, we provide training recipes and detailed analysis, with all code designed to be reproducible and runnable on a single node.

A key design choice is the separation of the RL workflow and the tool server. Every trajectory sends tool calls via a well-designed API interface after encountering an action stop token. The tool server handles requests with either multi-threading or Ray, ensuring high concurrency and stable resource management—for example, our math experiments run stably past 1k steps.

Our goal with VerlTool is to make it easy for the community to add new tools in ARLT training. Developers only need to inherit from BaseTool and adapt minimal code. In fact, you could even give the BaseTool file to GPT/Claude and get almost plug-and-play code.

We also explored important technical issues in Agentic RL, such as how much async rollouts can actually speed things up, or how tool response tokenization may cause off-policy drift. We hope these insights, while modest, can be useful for the community.

📄 HuggingFace Daily Paper: https://t.co/OkKNanXufu
🛠️Github: https://t.co/gmhic1BVPU

More details: (0/5)👇

2

154

37

88

17K

weiting_nlp retweeted

Jason Weston

@jaseweston

10 months ago

🌀Diversity Aware RL (DARLING)🌀 📝: https://t.co/MH0tui34Cb - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

jaseweston's tweet photo. 🌀Diversity Aware RL (DARLING)🌀
📝: https://t.co/MH0tui34Cb
- Jointly optimizes for quality & diversity using a learned partition function
- Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k
- Works for both non-verifiable & verifiable tasks
🧵1/5 https://t.co/AhEYPQwbkg

5

425

88

339

87K

Weiting (Steven) Tan @weiting_nlp

10 months ago

Many thanks to my collaborators @LianJiachen , @HirofumiInaguma , Paden Tomasello, Philipp Koehn, @xutai_ma For details, please refer to the artifacts below: 📄 Paper: https://t.co/sF1B2TM7kU 🔗 Code: https://t.co/qaXWJSkhxd This work was done at @jhuclsp and @AIatMeta

0

6

2

0

1K

Weiting (Steven) Tan @weiting_nlp

10 months ago

Can a model "see" emotion to "speak" with emotion? Yes! 🗣️ Our new work on Audio-Visual LMs shows that adding a visual stream makes generated speech more expressive. Check out our #EMNLP2025 Findings paper to see how we did it.

weiting_nlp's tweet photo. Can a model "see" emotion to "speak" with emotion? Yes! 🗣️

Our new work on Audio-Visual LMs shows that adding a visual stream makes generated speech more expressive. Check out our #EMNLP2025 Findings paper to see how we did it. https://t.co/xb61diJ5ZL

2

44

11

13

4K

weiting_nlp retweeted

Benjamin Van Durme @ben_vandurme

over 1 year ago

Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.

ben_vandurme's tweet photo. Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation. https://t.co/zV9wbUmaT8

3

129

28

70

13K

weiting_nlp retweeted

DeepSeek

@deepseek_ai

over 1 year ago

🛠️ DeepSeek-R1: Technical Highlights 📈 Large-scale RL in post-training 🏆 Significant performance boost with minimal labeled data 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 📄 More details: https://t.co/jWMxMVhGAQ 🐋 4/n

deepseek_ai's tweet photo. 🛠️ DeepSeek-R1: Technical Highlights

📈 Large-scale RL in post-training
🏆 Significant performance boost with minimal labeled data
🔢 Math, code, and reasoning tasks on par with OpenAI-o1
📄 More details: https://t.co/jWMxMVhGAQ

🐋 4/n https://t.co/mIUBn3qJhQ

240

5K

772

889

2M

weiting_nlp retweeted

JHU Computer Science @JHUCompSci

over 1 year ago

Congratulations to Prof. Philipp Koehn on being named a Fellow of the @aclmeeting! https://t.co/hVqQk8ekOs

0

30

4

0

5K

Weiting (Steven) Tan @weiting_nlp

over 1 year ago

@FeitengLi Thanks for your interests! We will open-source it once the paper is accepted somewhere and pass the internal legal review (as this work is done within Meta)

0

1

0

34

Weiting (Steven) Tan @weiting_nlp

over 1 year ago

Looking for a better way to fuse speech and text modality with pre-trained large language models? Check out our paper: SSR: Alignment-Aware Modality Connector for Speech Language Models 💡 🔗 https://t.co/d28Ew0yYzb #SpeechLM #ModalityFusion

weiting_nlp's tweet photo. Looking for a better way to fuse speech and text modality with pre-trained large language models?
Check out our paper: SSR: Alignment-Aware Modality Connector for Speech Language Models 💡

🔗 https://t.co/d28Ew0yYzb
#SpeechLM #ModalityFusion https://t.co/L0vsJD1hEH

3

39

9

10

6K

Weiting (Steven) Tan @weiting_nlp

over 1 year ago

I had a great time helping host MASC-SLL at Hopkins last year. MASC-SLL is a great opportunity to connect with fellow AI/NLP/Speech researchers. If your organization is in the Mid-Atlantic region and is interested in hosting the event, please reach out!

MASC-ALL Conference @MASC_Conference

over 1 year ago

📢 Want to host MASC 2025? The 12th Mid-Atlantic Student Colloquium is a one day event bringing together students, faculty and researchers from universities/industry in the Mid-Atlantic. Please submit this very short form if you are interested in hosting! Deadline January 6th

1

14

17

0

5K

0

4

1

0

1K

weiting_nlp retweeted

Tianjian Li @tli104

over 1 year ago

I have written a blogpost offering an explanation of why both the chosen and the rejected log-probability decreases during DPO, and more interestingly, why it is a desired phenomenon to some extent. Link: https://t.co/2zWCp6G0iT

0

13

6

4

3K

weiting_nlp retweeted

Sherjil Ozair

@sherjilozair

over 1 year ago

Very happy to hear that GANs are getting the test of time award at NeurIPS 2024. The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade. I took some time to reminisce how GANs came about and how AI has evolve in the last decade.

17

971

118

370

220K

Weiting (Steven) Tan

@weiting_nlp

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users