Yixin Nie

@EasonNie

Research Scientist at @AIatMeta @meta | ex Ph.D. at @uncnlp @unccs | ex M.S. at @uchicagocs @uchicago | Past Intern at @AIatMeta @microsoft | #NLProc & #AI

Joined February 2015

519 Following

891 Followers

105 Posts

EasonNie retweeted

Jason Weston

@jaseweston

5 days ago

Claim: Autoresearch that moves the frontier will be about better data: we call that *Autodata*. 🧵1/6 -- Paper is out! https://t.co/b8gOALndzy Key idea: agentic data creation provides a way to *convert increased inference compute into higher quality model training*. We show our method gives gains on computer science, legal and math problems over classical synthetic dataset creation methods. We also show how to train (meta-optimize) such a data scientist agent, so that it can create even stronger data. Overall, we believe this direction has the potential to change how we build AI data!

jaseweston's tweet photo. Claim: Autoresearch that moves the frontier will be about better data: we call that *Autodata*.

🧵1/6 -- Paper is out! https://t.co/b8gOALndzy

Key idea: agentic data creation provides a way to *convert increased inference compute into higher quality model training*.

We show our method gives gains on computer science, legal and math problems over classical synthetic dataset creation methods.

We also show how to train (meta-optimize) such a data scientist agent, so that it can create even stronger data.

Overall, we believe this direction has the potential to change how we build AI data!

836

118

857

60K

EasonNie retweeted

Jason Weston

@jaseweston

4 months ago

Continual learning in production FTW (with humans-in-the-loop) – a detailed report on methods to iteratively improve LLM social dialogue served to millions of users based on their interests. Personally, I've been working on pushing this direction for the last 10 years(!) (see papers below)! so it's exciting to see this stuff working in real systems. It will only get better -- lots more exciting methods now to try and more powerful models to make methods work than when I started. Some of my historical(!) research in this direction: 2025: The Era of Real-World Human Interaction: RL from User Conversations https://t.co/wmC2fVB0zu 2022: When life gives you lemons, make cherryade: Converting feedback from bad responses into good labels https://t.co/kJCanILmww Learning new skills after deployment: Improving open-domain internet-driven dialogue with human feedback https://t.co/2uJ88MzNRw 2020: Deploying lifelong open-domain dialogue learning https://t.co/0HiqfIAhpF Open problems in continuous learning: https://t.co/zAKl9WudAe 2016-2019: Learning from dialogue after deployment: Feed yourself, chatbot https://t.co/wrgByMfKFM Unlikelihood training https://t.co/KHACH5aqRd Dialogue learning with human-in-the-loop https://t.co/7PkrroT8TI

152

131

19K

Yixin Nie

@EasonNie

4 months ago

5/5 Beyond the numbers, we hope this work contributes toward the scientific rigor of optimizing subjective engagement in production LLMs. Unlike math or code, conversational AI lacks objective benchmarks and verifiable reward signals. Yet millions interact with these systems daily — and rigorous documentation of how to measure and improve them remains scarce in the literature.

307

Yixin Nie

@EasonNie

4 months ago

1/5 🤔 LLMs can solve olympiad math and write production code. But can they hold a conversation that's actually fun — one that people want to keep coming back to? 💬✨ We present CharacterFlywheel— an iterative process optimizing LLMs for real human engagement and character steerability, while maintaining rigorous safety protocols 🔒. Tested across Instagram, WhatsApp & Messenger 📱with millions of users — where they can create, share, and chat with their own AI characters 🤖. 📄 paper: https://t.co/WJ9xKz89Dt https://t.co/9lA3DdkWuM

EasonNie's tweet photo. 1/5 🤔 LLMs can solve olympiad math and write production code. But can they hold a conversation that's actually fun — one that people want to keep coming back to? 💬✨

We present CharacterFlywheel— an iterative process optimizing LLMs for real human engagement and character steerability, while maintaining rigorous safety protocols 🔒. Tested across Instagram, WhatsApp & Messenger 📱with millions of users — where they can create, share, and chat with their own AI characters 🤖.

📄 paper: https://t.co/WJ9xKz89Dt
https://t.co/9lA3DdkWuM

22K

Who to follow

Sewon Min

@sewon__min

Assistant professor @Berkeley_EECS @berkeley_ai || Research scientist at @allen_ai || PhD from @uwcse @uwnlp

Mohit Bansal

@mohitban47

Parker Distinguished Prof @UNC. PECASE/ACL/AAAI Fellow. Director https://t.co/5qlPVgnrlN (@unc_ai_group). Past @Berkeley_AI @TTIC_Connect @IITKanpur #NLP #CV

Robin Jia

@robinomial

Assistant Professor @CSatUSC | Previously Visiting Researcher @facebookai | Stanford CS PhD @StanfordNLP

Yixin Nie

@EasonNie

4 months ago

4/5 Post-launch (V8–V15), 7 of 8 deployed models showed positive engagement lift in controlled A/B tests, with top performers achieving +8.8% breadth and +19.4% depth 🔥. One release (V12) was a useful miss 🚨: RM win rate jumped to 70.7% while engagement declined—classic reward overfitting ⚠️. Takeaway: keep RM win rates below ~65% ✅. Subsequent versions recovered. Overall, engagement has trended upward across 9 months of deployment 🚀

EasonNie's tweet photo. 4/5 Post-launch (V8–V15), 7 of 8 deployed models showed positive engagement lift in controlled A/B tests, with top performers achieving +8.8% breadth and +19.4% depth 🔥.

One release (V12) was a useful miss 🚨: RM win rate jumped to 70.7% while engagement declined—classic reward overfitting ⚠️. Takeaway: keep RM win rates below ~65% ✅. Subsequent versions recovered.

Overall, engagement has trended upward across 9 months of deployment 🚀

514

EasonNie retweeted

Lin Guan @GuanSuns

5 months ago

Our latest work sheds light on what to "scale" when building generalizable agents: 👉 Prefer training envs with 📚 high state information richness (high perception load & info volume) + 🧠high planning complexity (long task horizon & branching factors) 👉 The *complexity structure* matters more than realism: Hard problems in "toy" domains like Sokoban / BlocksWorld can be more useful than easy problems in more realistic domains like ALFWorld. ‼️📉 Be careful about your mid-training datamix and strength: warmup and mid-training help prevent catastrophic forgetting during RL but undermines generalization to domains that are not covered 💡Applying lightweight state randomization/augmentation helps! 📄 Paper: https://t.co/qmaIjMFYwJ

EasonNie retweeted

Zhaoran Wang

@zhaoran_wang

5 months ago

People are putting lots of efforts into building diverse RL environments. But what kinds of environments are more useful for building **generalist** agents that are able to solve tasks beyond their training tasks? 🚀 Instead of optimizing one single benchmark, we look for drivers of transfer in our latest paper: https://t.co/cpgXXbiday Joint work with MSL @MetaAI (@ZhihanLiu21628 @GuanSuns @EasonNie @KaiZhang_CS @nazzhang) where @ZhihanLiu21628 interned. (1/n)

zhaoran_wang's tweet photo. People are putting lots of efforts into building diverse RL environments. But what kinds of environments are more useful for building **generalist** agents that are able to solve tasks beyond their training tasks?

🚀 Instead of optimizing one single benchmark, we look for drivers of transfer in our latest paper: https://t.co/cpgXXbiday

Joint work with MSL @MetaAI (@ZhihanLiu21628 @GuanSuns @EasonNie @KaiZhang_CS @nazzhang) where @ZhihanLiu21628 interned. (1/n)

112

103

EasonNie retweeted

Zineng Tang

@ZinengTang

about 1 year ago

Excited to share our new work! DOVE 🕊️: a dynamic vision encoder that adapts token count to image complexity. Fewer tokens, same fidelity—outperforming fixed-length AEs tokenizer on classification & VLM tasks! Arxiv: https://t.co/8qTZARXsCl Web: https://t.co/HPCTctwyVx #AI #CV

ZinengTang's tweet photo. Excited to share our new work!
DOVE 🕊️: a dynamic vision encoder that adapts token count to image complexity. Fewer tokens, same fidelity—outperforming fixed-length AEs tokenizer on classification & VLM tasks!
Arxiv: https://t.co/8qTZARXsCl
Web: https://t.co/HPCTctwyVx
#AI #CV https://t.co/ogs2QOanNg

110

11K

EasonNie retweeted

Jason Weston

@jaseweston

about 1 year ago

🚨Announcing RAM 2 workshop @ COLM25 - call for papers🚨 - 10 years on, we present the sequel to the classic RAM🐏 (Reasoning, Attention, Memory) workshop that took place in 2015 at the cusp of major change in the area. Now in 2025 we reflect on what's happened and discuss the future of these topics. 🌐: https://t.co/tXHeQD8Xek ======= Ten years ago… in Montreal 2015, the RAM workshop took place to bring together the burgeoning field covering the “interplay of reasoning, attention and memory”, just before Transformers were invented – but when many of the components to get there had just been published and were in place. The workshop included many speakers who are still prominent in pushing these directions today: Yoshua Bengio, Kyunghyun Cho, Jürgen Schmidhuber, Sainbayar Sukhbaatar, Ilya Sutskever, and more. Ten years later… we are hosting RAM 2 in the same location in Montreal, with a two-fold purpose. Firstly, as a retrospective and analysis of what has happened in the last 10 years. We are inviting presenters from the first workshop to this end, as well as to add their current perspectives. Hence secondly, and more importantly, we will bring together the field to discuss new trends and future directions for the next 10 years – which is further enabled by inviting new speakers, panelists and poster presenters discussing these fresh ideas. Why does this make sense? The RAM topic is as important as ever, and has gone on to dominate the field. These new directions include: R: New reasoning methods including both token-based and that use continuous vectors, and how they combine with memory. A: New attention methods that enable better reasoning and use of short and long-term memory. M: Architectural changes to LLMs to improve memory and reasoning capabilities. Overall, we highlight that the workshop is most concerned with methods that aim to explore the interplay between these three aspects. ==Workshop Event== Location: Palais des Congrès, Montreal, Canada Date: October 10, 2025 ==Call for Papers== We will host paper submissions on open review (link to appear here soon). We invite researchers and practitioners to submit their work to the COLM 2025 Workshop on Reasoning, Attention & Memory 2 (RAM2@COLM25). Submission Deadline: June 23, 2025 Author Notification Deadline: July 24, 2025 Submission Details: Submissions should follow the general guide for COLM conference. Papers can be up to 9 pages (not including references) and have to be anonymized. All submissions must be in PDF format, please use the LaTeX style files provided by organizers. ==Invited speakers== Yoshua Bengio @Yoshua_Bengio Kyunghyun Cho @kchonyc Yejin Choi @YejinChoinka Azalia Mirhoseini @Azaliamirh Juergen Schmidhuber @SchmidhuberAI Sainbayar Sukhbaatar @tesatory Jason Wei @_jasonwei ==Organizing Committee== Ilia Kulikov @IliaKulikov Jason Weston @jaseweston Jing XU @jingxu_ml Olga Golovneva @OlgaNLP Swarnadeep Saha @swarnaNLP Marjan Ghazvininejad @gh_marjan Ping Yu @ping_iris_yu

jaseweston's tweet photo. 🚨Announcing RAM 2 workshop @ COLM25 - call for papers🚨
- 10 years on, we present the sequel to the classic RAM🐏 (Reasoning, Attention, Memory) workshop that took place in 2015 at the cusp of major change in the area. Now in 2025 we reflect on what's happened and discuss the future of these topics.

🌐: https://t.co/tXHeQD8Xek

=======

Ten years ago… in Montreal 2015, the RAM workshop took place to bring together the burgeoning field covering the “interplay of reasoning, attention and memory”, just before Transformers were invented – but when many of the components to get there had just been published and were in place. The workshop included many speakers who are still prominent in pushing these directions today: Yoshua Bengio, Kyunghyun Cho, Jürgen Schmidhuber, Sainbayar Sukhbaatar, Ilya Sutskever, and more.

Ten years later… we are hosting RAM 2 in the same location in Montreal, with a two-fold purpose. Firstly, as a retrospective and analysis of what has happened in the last 10 years. We are inviting presenters from the first workshop to this end, as well as to add their current perspectives. Hence secondly, and more importantly, we will bring together the field to discuss new trends and future directions for the next 10 years – which is further enabled by inviting new speakers, panelists and poster presenters discussing these fresh ideas.

Why does this make sense? The RAM topic is as important as ever, and has gone on to dominate the field.

These new directions include:

R: New reasoning methods including both token-based and that use continuous vectors, and how they combine with memory.

A: New attention methods that enable better reasoning and use of short and long-term memory.

M: Architectural changes to LLMs to improve memory and reasoning capabilities.

Overall, we highlight that the workshop is most concerned with methods that aim to explore the interplay between these three aspects.

==Workshop Event==
Location: Palais des Congrès, Montreal, Canada
Date: October 10, 2025

==Call for Papers==
We will host paper submissions on open review (link to appear here soon). We invite researchers and practitioners to submit their work to the COLM 2025 Workshop on Reasoning, Attention & Memory 2 (RAM2@COLM25).
Submission Deadline: June 23, 2025
Author Notification Deadline: July 24, 2025
Submission Details: Submissions should follow the general guide for COLM conference. Papers can be up to 9 pages (not including references) and have to be anonymized. All submissions must be in PDF format, please use the LaTeX style files provided by organizers.

==Invited speakers==
Yoshua Bengio @Yoshua_Bengio
Kyunghyun Cho @kchonyc
Yejin Choi @YejinChoinka
Azalia Mirhoseini @Azaliamirh
Juergen Schmidhuber @SchmidhuberAI
Sainbayar Sukhbaatar @tesatory
Jason Wei @_jasonwei

==Organizing Committee==
Ilia Kulikov @IliaKulikov
Jason Weston @jaseweston
Jing XU @jingxu_ml
Olga Golovneva @OlgaNLP
Swarnadeep Saha @swarnaNLP
Marjan Ghazvininejad @gh_marjan
Ping Yu @ping_iris_yu

115

40K

EasonNie retweeted

Cindy XL 甜菜欣欣

@Tiancaixinxin

about 1 year ago

We just open-sourced the first 3D model dataset for Objaverse with quality annotations! If you are a researcher or a company looking to train your text-to-3D or image-to-3D models, this dataset may help. You’ll use our dataset to filter out only the high quality objects suitable for AI training - it’s better and faster. #cvpr #machinelearning #ai3d #ArtificialIntelligence #computervision

Tiancaixinxin's tweet photo. We just open-sourced the first 3D model dataset for Objaverse with quality annotations!
If you are a researcher or a company looking to train your text-to-3D or image-to-3D models, this dataset may help. You’ll use our dataset to filter out only the high quality objects suitable for AI training - it’s better and faster.
#cvpr #machinelearning #ai3d #ArtificialIntelligence #computervision

EasonNie retweeted

Zineng Tang

@ZinengTang

over 1 year ago

We are thrilled to announce TULIP! 🌷 https://t.co/gNVVbDzze1 A state of the vision language encoders coupled with generative model for stronger representation learning.

ZinengTang's tweet photo. We are thrilled to announce TULIP!

🌷 https://t.co/gNVVbDzze1

A state of the vision language encoders coupled with generative model for stronger representation learning. https://t.co/WyXm5zGntL

297

170

31K

EasonNie retweeted

Zi Lin

@suzzzylin

over 1 year ago

🚀Hitting a plateau in code generation due to the lack of high-quality data? What if we let LLMs generate and verify coding data themselves? 😅Fun fact: LLMs are like devs - great at coding, but reluctant testers! Their test generation lags far behind their coding ability. 💡Our solution: 𝑺𝒐𝒍-𝑽𝒆𝒓 - A self-play framework where LLMs learn to solve and verify themselves for code/test generation tasks! No human-annotated data or teacher models are needed for training! * LLMs act as both code solver & verifier * Self-play between coding & testing -- both improved over iterations! 📖For more details, check out our paper here: https://t.co/ngygACJgwE Joint work with @shengs1123 @shangjingbo @jaseweston @EasonNie

suzzzylin's tweet photo. 🚀Hitting a plateau in code generation due to the lack of high-quality data? What if we let LLMs generate and verify coding data themselves?
😅Fun fact: LLMs are like devs - great at coding, but reluctant testers! Their test generation lags far behind their coding ability.
💡Our solution: 𝑺𝒐𝒍-𝑽𝒆𝒓 - A self-play framework where LLMs learn to solve and verify themselves for code/test generation tasks! No human-annotated data or teacher models are needed for training!
* LLMs act as both code solver & verifier
* Self-play between coding & testing -- both improved over iterations!
📖For more details, check out our paper here: https://t.co/ngygACJgwE
Joint work with @shengs1123 @shangjingbo @jaseweston @EasonNie

318

207

32K

EasonNie retweeted

Douwe Kiela

@douwekiela

over 1 year ago

I’m really sad that my dear friend @FelixHill84 is no longer with us. He had many friends and colleagues all over the world - to try to ensure we reach them, his family have asked to share this webpage for the celebration of his life: https://t.co/1QoyHmAD3p

douwekiela's tweet photo. I’m really sad that my dear friend @FelixHill84 is no longer with us. He had many friends and colleagues all over the world - to try to ensure we reach them, his family have asked to share this webpage for the celebration of his life: https://t.co/1QoyHmAD3p https://t.co/6QZoZFhT3e

109

728

104

301K

EasonNie retweeted

Dieuwke Hupkes @_dieuwke_

over 1 year ago

I loved working on this project investigating how we can use NLI datasets to investigate LLMs. Particularly excited to see how they still are providing a relevant signal, and how match alignment of model distributions with humans has much improved but is far from solved

_dieuwke_'s tweet photo. I loved working on this project investigating how we can use NLI datasets to investigate LLMs. Particularly excited to see how they still are providing a relevant signal, and how match alignment of model distributions with humans has much improved but is far from solved https://t.co/nsp3mEHP8h

10K

EasonNie retweeted

Jason Weston

@jaseweston

almost 2 years ago

Today we are releasing code, models & data from the Self-Taught Evaluator paper, a method to train LLM judges with synthetic preference data. Better model trained with DPO: https://t.co/oGW56tGXNR Synthetic preference data: https://t.co/ixz4kSlVgg Code for training and inference: https://t.co/3NQtG6RY9y Since we released the paper detailing the method to generate synthetic data, the community has also used this approach to build improved RMs (https://t.co/4sLVBwuKxk). Our DPO model is a strong LLM judge on RewardBench, despite not using any human annotation in training data creation. We are releasing the training data, training and evaluation code to reproduce our results. The model is also available as an evaluator on the AlpacaEval leaderboard, as one of the top-ranked evaluators in terms of human agreement rate: https://t.co/KJCrL5Ssid You can also read the original paper here: https://t.co/aFpnt56TEk

jaseweston's tweet photo. Today we are releasing code, models & data from the Self-Taught Evaluator paper, a method to train LLM judges with synthetic preference data.

Better model trained with DPO: https://t.co/oGW56tGXNR

Synthetic preference data: https://t.co/ixz4kSlVgg

Code for training and inference: https://t.co/3NQtG6RY9y

Since we released the paper detailing the method to generate synthetic data, the community has also used this approach to build improved RMs (https://t.co/4sLVBwuKxk).

Our DPO model is a strong LLM judge on RewardBench, despite not using any human annotation in training data creation. We are releasing the training data, training and evaluation code to reproduce our results.

The model is also available as an evaluator on the AlpacaEval leaderboard, as one of the top-ranked evaluators in terms of human agreement rate: https://t.co/KJCrL5Ssid

You can also read the original paper here: https://t.co/aFpnt56TEk

528

384

80K

EasonNie retweeted

Barbara Plank @barbara_plank

almost 2 years ago

What wonderful news! Our paper VariErr (Variation vs Error) NLI got an #ACL2024NLP area chair award 😍 So thankful to the best paper award committee, the RQ is so close to my 🧡 Congrats to the amazing VariErr team @weberple @Logan_SiyaoPeng Marie-Catherine de Marneffe 👏

EasonNie retweeted

Adina Williams @adinamwilliams

almost 2 years ago

fyi: Meta's funding a few new grants for faculty to create evaluation benchmarks (deadline Sept 7). Could be of interest! #NLProc https://t.co/7JPz95o41C

17K

EasonNie retweeted

Hao Tan @HaoTan5

about 2 years ago

Welcome to join the 3D Foundation model workshop in CVPR@Seattle!

Yixin Nie

@EasonNie

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users