Fangyuan Xu

@brunchavecmoi

许方园👩🏻‍💻phd student @ nyu, interested in natural language processing

🌎

Joined August 2019

726 Following

594 Followers

199 Posts

Pinned Tweet

Fangyuan Xu @brunchavecmoi

4 months ago

A lot of useful training data can't be shared due to privacy. How do we create synthetic training data without data owners ever sharing their content? 🚀 Introducing 𝐃𝐏-𝐑𝐅𝐓: using RL to train LLMs to generate high-fidelity domain data without seeing a single private sample.

brunchavecmoi's tweet photo. A lot of useful training data can't be shared due to privacy. How do we create synthetic training data without data owners ever sharing their content?
🚀 Introducing 𝐃𝐏-𝐑𝐅𝐓: using RL to train LLMs to generate high-fidelity domain data without seeing a single private sample. https://t.co/U3WxKycja1

132

11K

brunchavecmoi retweeted

Bingbin Liu @BingbinL

13 days ago

MOSS@COLM2026 is calling for submissions! 💡 Please help spread the words, and hope to see you in SF!

brunchavecmoi retweeted

Vishakh Padmakumar

@vishakh_pk

13 days ago

People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)

$vishakh_pk's tweet photo. People are increasingly worried that AI tools make us overreliant. But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task. In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not. (1/9)$

208

76K

brunchavecmoi retweeted

Yulin Chen ✈️ ICML2026 @YulinChen99

28 days ago

Most assume unlearnable examples never get positive reward. They do. In our ICML paper, We reveal that a hard problem can receive positive reward during RLVR but remain unlearned. We show the phenomenon is more likely a representation issue rather than RL optimization artifact.

YulinChen99's tweet photo. Most assume unlearnable examples never get positive reward. They do.
In our ICML paper, We reveal that a hard problem can receive positive reward during RLVR but remain unlearned.

We show the phenomenon is more likely a representation issue rather than RL optimization artifact. https://t.co/Pvkcrcd0XN

365

262

29K

Who to follow

Tanya Goyal

@tanyaagoyal

Faculty @Cornell_CS. she/her

Zhaofeng Wu

@zhaofeng_wu

PhD student @MIT_CSAIL | Previously @allen_ai | MS'21 BS'19 BA'19 @uwnlp | 💼 on the industry job market

Xi Ye

@xiye_nlp

I study NLP. Postdoc fellow @PrincetonPLI. CS PhD @UTAustin.

brunchavecmoi retweeted

Hongli Zhan @HongliZhan

about 2 months ago

New paper! 🏁 My final one from my PhD at UT Austin. 🦜LLMs sound empathic, but they keep saying the same thing over and over. Not just the same words, the same discourse moves, turn after turn. We found that LLMs repeat the same discourse moves at nearly 2x the rate of human supporters across a multi-turn conversation, and existing metrics don’t catch this. So we built MINT 🌿 (Multi-turn Inter-tactic Novelty Training), the first RL framework to optimize discourse move diversity in multi-turn empathic dialogue. +25% empathy, −26% repetition. w/ @jessyjli @_desmond_ong et al. 📄 https://t.co/fJ8IvkXkbM

HongliZhan's tweet photo. New paper! 🏁 My final one from my PhD at UT Austin.

🦜LLMs sound empathic, but they keep saying the same thing over and over.

Not just the same words, the same discourse moves, turn after turn.

We found that LLMs repeat the same discourse moves at nearly 2x the rate of human supporters across a multi-turn conversation, and existing metrics don’t catch this.

So we built MINT 🌿 (Multi-turn Inter-tactic Novelty Training), the first RL framework to optimize discourse move diversity in multi-turn empathic dialogue. +25% empathy, −26% repetition.

w/ @jessyjli @_desmond_ong et al.

📄 https://t.co/fJ8IvkXkbM

10K

brunchavecmoi retweeted

Yuhan Liu @YuhanLiu_nlp

about 2 months ago

Can LLMs generate diverse outputs for open-ended questions? Is it helpful if we ensemble outputs from multiple models? We study 18 LLMs on 4 datasets and find that no single model is best at generating diverse outputs 👇/ 🧵

YuhanLiu_nlp's tweet photo. Can LLMs generate diverse outputs for open-ended questions? Is it helpful if we ensemble outputs from multiple models? We study 18 LLMs on 4 datasets and find that no single model is best at generating diverse outputs 👇/ 🧵 https://t.co/5GRrRE13fg

176

116

24K

brunchavecmoi retweeted

Chao Cao

@_chaocao_

about 2 months ago

Our first demo debuted on Jensen Huang's GTC keynote, and today we’re launching @SanchoRobotics 🚀 GTC keynote demo with @MultiplyLabs. Extended cut below.

107

16K

brunchavecmoi retweeted

Vishakh Padmakumar

@vishakh_pk

about 2 months ago

Really excited to have this dataset released to the community! There's a gap in our understanding of how users interact with coding agents at scale. SWE-chat fills that need to help shape the next generation of human-centered evals and training objectives for coding agents! 🤖🚀

brunchavecmoi retweeted

Yuqing Yang @yyqcode

about 2 months ago

🧵 1/8 What should an LLM assistant remember across conversations? Existing memory work studies this one task at a time. But real-world assistants see all kinds of conversations, and that changes the problem. Introducing BEHEMOTH 🦣 + CluE 🌱: a benchmark & self-evolving method for heterogeneous memory extraction. 📄 Paper: https://t.co/szLIOdA4bm

yyqcode's tweet photo. 🧵 1/8
What should an LLM assistant remember across conversations?

Existing memory work studies this one task at a time. But real-world assistants see all kinds of conversations, and that changes the problem.

Introducing BEHEMOTH 🦣 + CluE 🌱: a benchmark & self-evolving method for heterogeneous memory extraction.

📄 Paper: https://t.co/szLIOdA4bm

14K

brunchavecmoi retweeted

Zhaofeng Wu

@zhaofeng_wu

about 2 months ago

Excited to share our new work from Meta MSL 🔥 LLMs write great Python/C++ but struggle with uncommon languages. Data scarcity is the bottleneck ⌛ Can we leverage cross-PL transfer to overcome this? Yes ✅ A new method to unlock cross-PL transfer 🧵 https://t.co/8EK05gMgCl

zhaofeng_wu's tweet photo. Excited to share our new work from Meta MSL 🔥

LLMs write great Python/C++ but struggle with uncommon languages. Data scarcity is the bottleneck ⌛ Can we leverage cross-PL transfer to overcome this? Yes ✅

A new method to unlock cross-PL transfer 🧵 https://t.co/8EK05gMgCl https://t.co/DFlRLyuzaP

117

19K

brunchavecmoi retweeted

Joongwon Kim

@danieljwkim

about 2 months ago

New work @AIatMeta: We enable test-time scaling for long-horizon coding agents by using better representations, selection and reuse of agentic trajectories, with Claude 4.5 Opus improving by +6.7% on SWE-Bench Verified and +12.1% on Terminal-Bench 2.0. 📄: https://t.co/tvhdw0DuYd

danieljwkim's tweet photo. New work @AIatMeta: We enable test-time scaling for long-horizon coding agents by using better representations, selection and reuse of agentic trajectories, with Claude 4.5 Opus improving by +6.7% on SWE-Bench Verified and +12.1% on Terminal-Bench 2.0.
📄: https://t.co/tvhdw0DuYd https://t.co/ejgxmD2DDC

360

262

279K

brunchavecmoi retweeted

Yoonsang Lee @yoonsang_

2 months ago

How should we effectively aggregate long-horizon agent trajectories? 🧐 Unlike CoT reasoning, agentic tasks pose unique challenges: they are long, multi-turn, and tool-augmented. Introducing 👉🏻 AggAgent 👈🏻 — which treats parallel trajectories as an environment to interact with.

yoonsang_'s tweet photo. How should we effectively aggregate long-horizon agent trajectories? 🧐

Unlike CoT reasoning, agentic tasks pose unique challenges: they are long, multi-turn, and tool-augmented.

Introducing 👉🏻 AggAgent 👈🏻 — which treats parallel trajectories as an environment to interact with. https://t.co/MMnDF6VD0z

260

191

28K

brunchavecmoi retweeted

Jenna Russell

@jennajrussell

2 months ago

Would you realize if the book you were reading was AI? What if it was humanized to remove AI-speak? We find that even without using stylistic cues (e.g., word choice or sentence structure) narrative choices alone give AI fiction away!

jennajrussell's tweet photo. Would you realize if the book you were reading was AI? What if it was humanized to remove AI-speak?

We find that even without using stylistic cues (e.g., word choice or sentence structure) narrative choices alone give AI fiction away! https://t.co/iG1kkXiVFt

226

127

45K

brunchavecmoi retweeted

Zayne Sprague

@ZayneSprague

2 months ago

https://t.co/Zyo6d1sGmL

133

11K

brunchavecmoi retweeted

Hongli Zhan @HongliZhan

2 months ago

PhD defended at UT Austin today.🤘 The best thing was having an advisor who believed in me before I believed in myself. Jessy taught me how to write, how to think, and how to chase research ideas. Then the rest followed. Thank you, @jessyjli

HongliZhan's tweet photo. PhD defended at UT Austin today.🤘

The best thing was having an advisor who believed in me before I believed in myself.

Jessy taught me how to write, how to think, and how to chase research ideas. Then the rest followed.

Thank you, @jessyjli https://t.co/L7J7gPT8sj

124

13K

brunchavecmoi retweeted

Chau Minh Pham @chautmpham

3 months ago

👀 Can AI produce a novel worth reading? We built a platform to find out. 📚 Introducing AutoFiction: a web platform that hosts AI-generated novels by Claude Code & Codex, rated and reviewed by real readers. We have 33 books so far, spanning dark fantasy, murder mysteries, Harry Potter fanfics, and more. All free to read. (1/n)

chautmpham's tweet photo. 👀 Can AI produce a novel worth reading? We built a platform to find out.

📚 Introducing AutoFiction: a web platform that hosts AI-generated novels by Claude Code & Codex, rated and reviewed by real readers.

We have 33 books so far, spanning dark fantasy, murder mysteries, Harry Potter fanfics, and more. All free to read.

(1/n)

brunchavecmoi retweeted

Tengxiao Liu

@TengxiaoLiu

3 months ago

Auto research is on 🔥 We give algorithmic problems (like circle packing) to general coding agents, let it run overnight. 🌙 Agents reach SoTA. But more importantly: we analyze 100+ hours of trajectories to understand how it gets there 🧵

TengxiaoLiu's tweet photo. Auto research is on 🔥

We give algorithmic problems (like circle packing) to general coding agents, let it run overnight. 🌙

Agents reach SoTA. But more importantly: we analyze 100+ hours of trajectories to understand how it gets there 🧵 https://t.co/5cVuoIdxVc

32K

brunchavecmoi retweeted

Yoonjoo Lee

@yoonjoo_le2

3 months ago

Proud to share our CHI 2026 Honorable Mention paper, Evalet! 🏅 LLM-as-a-Judge is everywhere, but a single score hides so much. Evalet fragments outputs into functional units so you can see exactly what's working and what's not—across hundreds of outputs, from reasoning traces to red-teaming conversations to computer-use agents. I had a great time working on this project led by the amazing @tae_skim and @heechanleekr, with @josephseering and @imjuhokim ! Check out the full breakdown below ⬇️

brunchavecmoi retweeted

Shankar Padmanabhan @shankarpad8

3 months ago

1/5 How do we update a model trained in 2025 with new world knowledge from 2026? ⚠️Continued training will undo skills learned by LLMs during post-training, e.g. instruction-following/math/code. 🤝Our method DiSC updates LLMs with new knowledge while preserving existing skills!

11K

brunchavecmoi retweeted

Shuyan Zhou

@shuyanzh36

3 months ago

In 2023, WebArena took 7 grad students more than 6 months to build just 5 environments with 812 variable browser-use tasks. Now, it takes under 10 hours and less than $100 per environment, with easy support for parallel generation. Excited to introduce WebArena-Infinity: a scalable approach for automatically generating high-authenticity, high-complexity browser environments with verifiable tasks suitable for RL training and benchmarking. Even strong open-source models that already achieve 60%+ success rates on WebArena and OSWorld complete fewer than 50% of tasks here. Project page: https://t.co/tEtYkChMBt Repo: https://t.co/lBg69T12xx 🧵 (1/n)

331

193

44K

brunchavecmoi retweeted

Vaibhav Adlakha

@vaibhav_adlakha

3 months ago

Your LLM already knows the answer. Why is your embedding model still encoding the question? 🚨Introducing LLM2Vec-Gen: your frozen LLM generates the answer's embedding in a single forward pass — without ever generating the answer. Not only that, the frozen LLM can decode the embedding back into text. 🏆 SOTA self-supervised embeddings 🛡️ Free transfer of instruction-following, safety, and reasoning

193

121

50K

Fangyuan Xu

@brunchavecmoi

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users