Xin Wen @_xwen_ - Twitter Profile

Pinned Tweet

12 months ago

Can we decouple semantics from spectrum for image tokenizers? Our answer: the 1D Semanticist tokenizer. We push the burden of photo-realistic image generation to diffusion decoders, and let the tokens focus on the semantic structure. The PCA-like structure is induced by nested CFG (dropout), allowing plausible reconstruction & generation with very few tokens, and coarse-to-fine hierarchy. Thanks to semantic-spectrum decoupling, our tokenizer also achieves strong performance on ImageNet linear probing, indicating potential for unified understanding and generation. To check for more details, see you 1-2 pm TODAY at ExHall D, GMCV Workshop! "Principal Components" Enable A New Language of Images Project Page: https://t.co/yzISM0c5xQ Code: https://t.co/L13AH9fQKs

_xwen_'s tweet photo. Can we decouple semantics from spectrum for image tokenizers? Our answer: the 1D Semanticist tokenizer.

We push the burden of photo-realistic image generation to diffusion decoders, and let the tokens focus on the semantic structure. The PCA-like structure is induced by nested CFG (dropout), allowing plausible reconstruction & generation with very few tokens, and coarse-to-fine hierarchy.

Thanks to semantic-spectrum decoupling, our tokenizer also achieves strong performance on ImageNet linear probing, indicating potential for unified understanding and generation.

To check for more details, see you 1-2 pm TODAY at ExHall D, GMCV Workshop!

"Principal Components" Enable A New Language of Images
Project Page: https://t.co/yzISM0c5xQ
Code: https://t.co/L13AH9fQKs

2

16

5

2

18K

_xwen_ retweeted

Vincent Sitzmann @vincesitzmann

1 day ago

Introducing MilliVid, our new method for long-context video generation! MilliVid creates videos that are consistent over long time spans, without using retrieval heuristics or 3D maps! (1/n) https://t.co/evmf5dL5Sg

9

332

58

191

33K

_xwen_ retweeted

Sungjin Ahn

@SungjinAhn_

21 days ago

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: https://t.co/JC7EyXYc9Y 🌐 Project page: https://t.co/LRT1dQiWLZ w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

SungjinAhn_'s tweet photo. 🧠We introduce "Generative Recursive Reasoning"!

Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor.

Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling.

And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x).

With only 10M params:
• Sudoku-Extreme: 97.0% (TRM 87.4%)
• ARC-AGI-1: 52.0%
• ARC-AGI-2: 11.1%
• N-Queens coverage: 90%+

📄 Paper: https://t.co/JC7EyXYc9Y
🌐 Project page: https://t.co/LRT1dQiWLZ

w/
Junyeob Baek @JunyeobB (KAIST),
Mingyu Jo @pyross0000 (KAIST),
Minsu Kim @minsuuukim (KAIST & Mila),
Mengye Ren @mengyer (NYU),
Yoshua Bengio @Yoshua_Bengio (Mila),
Sungjin Ahn @SungjinAhn_ (KAIST)

31

1K

209

1K

182K

Xin Wen @_xwen_

24 days ago

Xiaoyang is on the job market!

Xiaoyang Lyu @XiaoyangLyu22

25 days ago

🚀 Excited to share our newest project: Articraft! We built a brand-new agentic pipeline to generate articulated 3D objects completely from scratch. 🤖 The generated objects are extremely clean and ready to drop right into physical simulators. https://t.co/6GUjFZKYZU

2

39

4

24

6K

0

2

0

1

205

Who to follow

Bingchen Zhao

@BingchenZhao

PhD student at the University of Edinburgh @ancAtEd @EdinburghVision. https://t.co/WDUG64sGBu

Yixiao Ge

@ge_yixiao

Director @XPENGRobotics @XPengMotors. We are hiring!🦾 Previously Principal Researcher @TencentGlobal. PhD from MMLab @CUHKofficial.

Ryan Yuan

@RainbowYuhui

Research Director@Canva; ex-MSR. Build a research team focused on fundamental research for world-leading graphic design generation. Email: [email protected]

_xwen_ retweeted

Jiawei Yang

@JiaweiYang118

about 1 month ago

Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space. Now it is 0.75, and can be even lower. Many wonder how. I thought it might end as a small FID prank: simple and deliberate. It started with one question: can FID be optimized directly, and what does it reveal? Introducing FD-loss.

JiaweiYang118's tweet photo. Two months ago, I vaguely posted a number: 0.9 FID, one-step, pixel space.

Now it is 0.75, and can be even lower.

Many wonder how.

I thought it might end as a small FID prank: simple and deliberate.

It started with one question: can FID be optimized directly, and what does it reveal?

Introducing FD-loss.

56

953

157

627

229K

_xwen_ retweeted

Fatih Dinc

@fatihdin4en

about 1 month ago

Where does agency come from in a neural network? We trained RNNs to chase a moving target. Some learn to react. Others learn to predict, anticipate, and even wait (schematic gif attached) The difference comes down to the dimensionality of the neural code. 🧵 (1/7)

10

664

107

572

64K

_xwen_ retweeted

Hanwen Jiang

@hanwenjiang1

about 1 month ago

GPT Image 2 has been deeply unsettling to me in the best way. Some of its outputs make it hard for me to keep using the old criterion of vision, especially the old definition of visual representation learning. Thus, I wrote this essay as a reflection on that shift: why knowledge may be the right name for what vision once called representation, and what can be the ultimate formulation for representation learning. (An unexpected side path: it also led me to think about the relation between knowledge and representation through the old calligraphic relation between spirit and form 😃 https://t.co/AW8Nd8s9hw

5

151

26

91

17K

_xwen_ retweeted

Nick Levine

@status_effects

about 1 month ago

It's an important question how much LM capabilities arise from memorization vs generalization. Vintage LMs enable unique generalization tests. For example: we are studying whether talkie can acquire coding capabilities purely through in-context learning at sufficient scale.

status_effects's tweet photo. It's an important question how much LM capabilities arise from memorization vs generalization. Vintage LMs enable unique generalization tests. For example: we are studying whether talkie can acquire coding capabilities purely through in-context learning at sufficient scale. https://t.co/UsFyiNzbtQ

2

145

5

26

15K

_xwen_ retweeted

Shangbang Long @ShangbangLong

about 2 months ago

🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model. 🖼️ Check out our gallery at https://t.co/CEQJXroPaE 🧵 (1/N) continue ⬇️

22

432

71

264

61K

_xwen_ retweeted

Mihir Prabhudesai

@mihirp98

about 2 months ago

What if AI learned physics the way Newton did – by experiencing it? We built Sim2Reason: train LLMs inside virtual worlds governed by real physics laws, zero human annotation. Result: +5–10% improvement on International Physics Olympiad, zero-shot. 🧵

38

2K

221

1K

244K

Xin Wen @_xwen_

about 2 months ago

Congrats Dr. Tong! 🍾🍾🍾

Peter Tong

@TongPetersb

about 2 months ago

I defended my thesis today! Sincere thanks to my advisors @sainingxie @ylecun and committee members: @mengyer @YiMaTweets @LukeZettlemoyer @liuzhuang1234. I could not have wished for a better PhD life, and I want to thank everyone who was part of this journey. Slides Link: https://t.co/UoD65snQLX

TongPetersb's tweet photo. I defended my thesis today! Sincere thanks to my advisors @sainingxie @ylecun and committee members: @mengyer @YiMaTweets @LukeZettlemoyer @liuzhuang1234. I could not have wished for a better PhD life, and I want to thank everyone who was part of this journey.

Slides Link: https://t.co/UoD65snQLX

111

2K

59

228

175K

1

0

354

_xwen_ retweeted

Fei Xia

@xf1280

about 2 months ago

Our colleague Bernie Huang did something even cooler, where he asked the model to "divide busy tiles further as needed", and got even better results. This leverages the fact the model can render and reflect. https://t.co/hAK82enW3w

xf1280's tweet photo. Our colleague Bernie Huang did something even cooler, where he asked the model to "divide busy tiles further as needed", and got even better results. This leverages the fact the model can render and reflect.

https://t.co/hAK82enW3w https://t.co/XxujvbZMC6

1

16

1

5

1K

_xwen_ retweeted

Shengjia Zhao

@shengjia_zhao

2 months ago

Excited to share what we’ve been building at Meta Superintelligence Labs! We just released Muse Spark, our first AI model. It's a natively multimodal reasoning model and the first step on our path to personal superintelligence. We've overhauled our entire stack to support scaling, and this is just the beginning. https://t.co/KNVjgMcch1

shengjia_zhao's tweet photo. Excited to share what we’ve been building at Meta Superintelligence Labs! We just released Muse Spark, our first AI model. It's a natively multimodal reasoning model and the first step on our path to personal superintelligence. We've overhauled our entire stack to support scaling, and this is just the beginning.

https://t.co/KNVjgMcch1

74

2K

173

234

236K

_xwen_ retweeted

Baifeng

@baifeng_shi

3 months ago

Humans can see in high-res, high-FPS in real-time. Why can't VLMs? Introducing AutoGaze: ViTs/VLMs "gaze" only at key video regions! Up to 4-100x token savings, 19x speedup, and enables scaling to 4K-res 1K-frame videos. 📄 https://t.co/GhbWZwMAg7 🌐 https://t.co/mEJ991MAIR 🤗 https://t.co/FOfc2QRThi (1/n)🧵

47

2K

202

1K

158K

Xin Wen @_xwen_

3 months ago

Congrats @BingchenZhao!

Zhengyao Jiang

@zhengyaojiang

3 months ago

We're excited to announce that @BingchenZhao, who built the predecessor of AutoResearch, has joined @WecoAI full-time! Bingchen is the first author of LLMSpeedrunner at Meta FAIR, which ran the automated research loop on @karpathy's NanoGPT, which later evolved into NanoChat and the speedrun community where AutoResearch operates today. Weco has been committed to ML research automation for 2.5 years, starting with AIDE. We're super pumped by how large an impact AIDE has had, topping @OpenAI's MLE-Bench and @METR_Evals' RE-Bench, and becoming a foundation for AI Scientist v2, AIRA-Dojo, and LLMSpeedrunner itself. And AutoResearch, with AIDE's simple greedy discard/keep loop reaching a mass audience, is really building consensus that the empirical research loop can and should be automated. We're excited to keep pushing this frontier, not just as a concept but seriously bringing it to the real world, and materially accelerating the knowledge generation of humanity.

1

83

8

45

11K

0

5

0

282

_xwen_ retweeted

Phillip Isola @phillip_isola

3 months ago

Sharing “Neural Thickets”. We find: In large models, the neighborhood around pretrained weights can become dense with task-improving solutions. In this regime, post-training can be easy; even random guessing works Paper: https://t.co/qlXEkJHSZa Web: https://t.co/xYoYctEqHn 1/

phillip_isola's tweet photo. Sharing “Neural Thickets”. We find:

In large models, the neighborhood around pretrained weights can become dense with task-improving solutions.

In this regime, post-training can be easy; even random guessing works

Paper: https://t.co/qlXEkJHSZa
Web: https://t.co/xYoYctEqHn

1/ https://t.co/GnJLMEhKNV

26

926

125

646

144K

Xin Wen @_xwen_

3 months ago

@sainingxie @ylecun @amilabs Congrats!

1

0

181

_xwen_ retweeted

Saining Xie

@sainingxie

3 months ago

another scientific exploration from @TongPetersb, @DavidJFan, and @__JohnNguyen__ that might teach you something new, even if you’re in a frontier lab lots of interesting observations here, but I’ll highlight just one: - it’s kind of an open industry secret that trying to scale DiTs with MoE has mostly been fruitless. - the unexpected, yet intuitive, synergy between RAE and MoE might actually change that.

2

242

31

110

27K

_xwen_ retweeted

Jana @Jana_Zeller

4 months ago

Can AI reason by “imagining” — not just by seeing or reading? We introduce Mentis Oculi, a benchmark for machine mental imagery: multi-step visual puzzles that require maintaining and updating visual states over time. 📄 https://t.co/fDs2kdNfcU 🌐 https://t.co/BUVO3PkgkT 🧵⬇️

9

107

22

60

9K

_xwen_ retweeted

Logical Intelligence @logic_int

4 months ago

Why Sudoku? It tests constraint satisfaction and LLMs can't do it. LLMs are good at language problems. Robotics and industrial control systems problems aren't like that. That's why we created a new architecture that is the foundation of Kona’s Energy-Based Reasoning Model.

logic_int's tweet photo. Why Sudoku?

It tests constraint satisfaction and LLMs can't do it.

LLMs are good at language problems. Robotics and industrial control systems problems aren't like that.

That's why we created a new architecture that is the foundation of Kona’s Energy-Based Reasoning Model. https://t.co/SaBAi3X0TI

11

140

13

77

34K

_xwen_ retweeted

Relja Arandjelović @relja_work

5 months ago

"we knew that international law applied with varying rigour depending on the identity of the accused or the victim." The emperor is naked. Only took "the west" decades to realize it, the rest of the world laughs (or 🤮) whenever a NATO country mentions "international law"

0

5

2

0

720

Xin Wen

@_xwen_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users