Stephen Xie @stephenx_ - Twitter Profile

New paper! LLM memory keeps improving, but this makes them *worse* as user sims. If we want to build models that can, e.g., simulate realistic students to train chatbots to be better teachers, then these models need to be able to forget like humans do 📄: https://t.co/1GpOfwcsat

NickATomlin's tweet photo. New paper! LLM memory keeps improving, but this makes them *worse* as user sims. If we want to build models that can, e.g., simulate realistic students to train chatbots to be better teachers, then these models need to be able to forget like humans do

📄: https://t.co/1GpOfwcsat https://t.co/IDePa4f6gw

14

458

70

319

46K

stephenx_ retweeted

Delta Institute

@DeltaInstitutes

19 days ago

Today, we’re thrilled to share that our friends at @TrajectoryLabs have officially launched from stealth!!! We first met Ronak a few days before Delta was founded in February 2025, and it’s been an incredible journey seeing him grow from his time at Windsurf, to DeepMind, and now to Trajectory. The Trajectory team has seen Delta’s full trajectory, and now we’re incredibly excited for the world to see theirs!! 🚀

DeltaInstitutes's tweet photo. Today, we’re thrilled to share that our friends at @TrajectoryLabs have officially launched from stealth!!!

We first met Ronak a few days before Delta was founded in February 2025, and it’s been an incredible journey seeing him grow from his time at Windsurf, to DeepMind, and now to Trajectory.

The Trajectory team has seen Delta’s full trajectory, and now we’re incredibly excited for the world to see theirs!! 🚀

12

125

16

13

11K

Stephen Xie

@stephenx_

26 days ago

Source: https://t.co/lcgQPtteiX

0

3

0

1

167

Stephen Xie

@stephenx_

26 days ago

“When all is said and done, and Nature passes her final judgement, you will not be measured by the number of moments in which you worked as hard as you could. You will not be judged by someone rooting around in your mind to see whether you were good or bad. You will not be evaluated according to how unassailable your explanations are, for why the things that you couldn't possibly have prevented the things that went wrong. You will be measured only by what actually happens, as will we all.”

Will Manidis

@WillManidis

26 days ago

https://t.co/aHOHtb1rcW

177

3K

334

4K

1M

4

22

1

7

2K

Stephen Xie

@stephenx_

about 1 month ago

more work to be done🫡

Grace Li

@grx_xce

about 1 month ago

If you're making Slides, the best in class are Opus 4.7 and GLM 5.1 Grok 4.3 is not far behind in 5th overall Congrats to @AnthropicAI, @Zai_org, @xai

6

52

2

4

9K

2

65

1

0

4K

Stephen Xie

@stephenx_

about 1 month ago

@mycharmspace it was awesome working with you🫡

0

8

0

1K

Stephen Xie

@stephenx_

about 1 month ago

@alexshander03 @JudgmentLabs 🚀

0

3

0

87

Stephen Xie

@stephenx_

about 1 month ago

@thinkymachines @LongTonyLian congrats🎉

0

127

Stephen Xie

@stephenx_

about 1 month ago

@saurishs @xai @santiagomed @aypan_17 will miss seeing u around - wishing you the best!🚀🚀🚀

0

5

0

529

Stephen Xie

@stephenx_

about 1 month ago

Huge thanks to @NickATomlin @alsuhr @parksooojae @xyntechx @kaivalss @jyotiinar @v_kethana @SyrielleMontar1 @__anyaj @jiayi_pirate @xiuyu_l @a1zhang Geogia Zhou, Karl Vilhelmsson, Jaewon Chang, Cameron Jordan, and Erran Li for the insightful feedbacks!

0

23

0

2

1K

Stephen Xie

@stephenx_

about 1 month ago

Longer chain-of-thought = slower inference, more context rot, and ballooning compute. So what if the model could decide for itself when to go parallel? Our new BAIR blog breaks down Adaptive Parallel Reasoning (APR) — the next paradigm in inference-time scaling. 🧵

stephenx_'s tweet photo. Longer chain-of-thought = slower inference, more context rot, and ballooning compute.

So what if the model could decide for itself when to go parallel?

Our new BAIR blog breaks down Adaptive Parallel Reasoning (APR) — the next paradigm in inference-time scaling. 🧵 https://t.co/RBXXk7NlEO

14

455

48

285

41K

Stephen Xie

@stephenx_

about 1 month ago

Full post — inference systems, training recipes, reward design, eval, and a survey of Multiverse, Parallel-R1, NPR, ThreadWeaver + the original APR method (Pan et al., 2025): https://t.co/iFQwVISoue Co-authored with @tonylian!

1

41

5

17

2K

Stephen Xie

@stephenx_

Last Seen Users on Sotwe

Trends for you

Most Popular Users