Shengyi Qian @JasonQSY - Twitter Profile

Pinned Tweet

3 months ago

Vision isn't an "add-on"—and we have the data to prove it. 👁️⚡️ Thrilled to share our new work on Transfusion-style models. We explored treating visual data as a first-class citizen from day one, from architecture to scaling behavior. Check it out: 🔗 https://t.co/zONvWOFCuI

David Fan

@DavidJFan

3 months ago

[1/9] What happens when you treat vision as a first-class citizen during multimodal pretraining? To find out, we studied the design space of training Transfusion-style models that input and output all modalities, from scratch. Here is what we learned about visual representations, data, world modeling, architecture, and scaling behavior! Paper: https://t.co/ik6JGgjbTD Website: https://t.co/nklaggMEfT @TongPetersb, @DavidJFan, @__JohnNguyen__, @ellisbrown, @GaoyueZhou, @JasonQSY, @boyangzheng, @webalorn, @han_junlin, @rob_fergus, @NailaMurray, @gh_marjan, @ml_perception, Nicolas Ballas, @_amirbar, Michael Rabbat, Jakob Verbeek, @LukeZettlemoyer, @koustuvsinha, @ylecun, @sainingxie

12

301

60

209

51K

1

16

2

7

3K

Shengyi Qian @JasonQSY

about 1 month ago

@DJiafei Congrats!

0

1

0

49

Shengyi Qian @JasonQSY

about 2 months ago

@TongPetersb @ylecun @sainingxie @mengyer @YiMaTweets @LukeZettlemoyer @liuzhuang1234 Congrats!

1

0

94

JasonQSY retweeted

Shengjia Zhao

@shengjia_zhao

about 2 months ago

Excited to share what we’ve been building at Meta Superintelligence Labs! We just released Muse Spark, our first AI model. It's a natively multimodal reasoning model and the first step on our path to personal superintelligence. We've overhauled our entire stack to support scaling, and this is just the beginning. https://t.co/KNVjgMcch1

shengjia_zhao's tweet photo. Excited to share what we’ve been building at Meta Superintelligence Labs! We just released Muse Spark, our first AI model. It's a natively multimodal reasoning model and the first step on our path to personal superintelligence. We've overhauled our entire stack to support scaling, and this is just the beginning.

https://t.co/KNVjgMcch1

74

2K

172

233

235K

Who to follow

Yixuan Wang

@YXWangBot

CS Ph.D. student @Columbia & Research Scientist @NVIDIARobotic | Prev. Meta FAIR Embodied AI, Boston Dynamics AI Institute, Google X #Vision #Robotics #Learning

Nilesh Kulkarni

@_nileshk

RS@MSL, working on MLLMs at @Meta Ex- Netflix, Waymo, Google Research

Martin Ziqiao Ma

@ziqiao_ma

technical staff @thinkymachines; less technical stuff @aclmentorship; phd @umich; views are my own

JasonQSY retweeted

Alexandr Wang

@alexandr_wang

about 2 months ago

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

alexandr_wang's tweet photo. 1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵 https://t.co/fThDXdsxwB

737

10K

1K

3K

5M

Shengyi Qian @JasonQSY

3 months ago

@iamsashasax @AnthropicAI Congrats Sasha!

0

37

JasonQSY retweeted

DailyPapers

@HuggingPapers

3 months ago

Beyond Language Modeling FAIR Meta and NYU present a deep dive into native multimodal pretraining. They show RAEs unify visual understanding/generation, vision/language data are complementary, world modeling emerges naturally, and MoE harmonizes vision's higher data hunger—paving the way for truly unified models.

HuggingPapers's tweet photo. Beyond Language Modeling

FAIR Meta and NYU present a deep dive into native multimodal pretraining. They show RAEs unify visual understanding/generation, vision/language data are complementary, world modeling emerges naturally, and MoE harmonizes vision's higher data hunger—paving the way for truly unified models.

1

37

6

16

3K

JasonQSY retweeted

John Nguyen

@__JohnNguyen__

3 months ago

Humans communicate through language and interact with the world through vision, yet most multimodal models are language-first. What happens when we go beyond language? 🤔 Beyond Language Modeling: a deep dive into the design space of truly native multimodal models Paper: https://t.co/KOpmL1PItn Project: https://t.co/Oy6XuEtUAi

__JohnNguyen__'s tweet photo. Humans communicate through language and interact with the world through vision, yet most multimodal models are language-first. What happens when we go beyond language? 🤔
Beyond Language Modeling: a deep dive into the design space of truly native multimodal models

Paper: https://t.co/KOpmL1PItn
Project: https://t.co/Oy6XuEtUAi

10

202

39

157

40K

JasonQSY retweeted

Peter Tong

@TongPetersb

3 months ago

Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

TongPetersb's tweet photo. Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision.
We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]

35

1K

220

721

217K

Shengyi Qian @JasonQSY

4 months ago

Excited to share our latest work from Meta Superintelligence Labs! 🚀 We’re moving beyond static AI to agents that actually evolve with you. Our PAHF framework solves "Alignment Drift" through a continuous feedback loop. Check out the paper!

Kaiqu Liang

@kaiqu_liang

4 months ago

New Meta Research 🚀 AI agents are powerful, but don’t stay aligned with you over time. When preferences shift, they don’t adapt. You correct them once…they repeat the mistake. 🤦 Introducing PAHF: continual personalization where agents learn from feedback to stay in sync.

kaiqu_liang's tweet photo. New Meta Research 🚀

AI agents are powerful, but don’t stay aligned with you over time.

When preferences shift, they don’t adapt. You correct them once…they repeat the mistake. 🤦

Introducing PAHF: continual personalization where agents learn from feedback to stay in sync. https://t.co/CCED7S4PQn

10

317

41

253

48K

0

13

2

0

2K

Shengyi Qian @JasonQSY

4 months ago

We are excited to host the 2nd 3D-LLM / VLA Workshop at CVPR this June! If your research explores the synergy between spatial intelligence, robotics, and language grounding, we invite you to submit your work. We also have an incredible lineup of speakers. Join us!

Yining Hong

@yining_hong

4 months ago

LLMs are now learning space, geometry, and how to move. 🤖📐 The 2nd CVPR 3D-LLM VLA Workshop brings together language, 3D perception, and action for embodied intelligence. 📢 Call for Papers is OPEN: https://t.co/Zff45s3wKT 🌐 Website: https://t.co/BhgA2OnfLQ If your research lives at the intersection of words, worlds, and robots—this one’s for you. #CVPR2026 @CVPR

yining_hong's tweet photo. LLMs are now learning space, geometry, and how to move. 🤖📐

The 2nd CVPR 3D-LLM VLA Workshop brings together language, 3D perception, and action for embodied intelligence.

📢 Call for Papers is OPEN: https://t.co/Zff45s3wKT 🌐 Website: https://t.co/BhgA2OnfLQ

If your research lives at the intersection of words, worlds, and robots—this one’s for you.

#CVPR2026 @CVPR

1

145

20

44

16K

0

15

2

4

2K

JasonQSY retweeted

Yuhang Zhou @YuhangZhou2

7 months ago

Have arrived in Suzhou! I will present DISCO paper in EMNLP 2025 Thursday’s noon poster session. Feel free to reach out and discuss! If you’re interested in Meta’s current position for both FTE or internships, also let me know! #EMNLP2025

2

37

9

3

11K

JasonQSY retweeted

Yuhang Zhou @YuhangZhou2

10 months ago

Glad to have my PhD’s last work accepted by EMNLP 2025! Thank all my collaborators for their efforts. Expect to see you all in Soochow! #EMNLP25

1

15

1

0

2K

JasonQSY retweeted

Liliang Ren

@liliang_ren

11 months ago

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput at 32K generation length with vLLM. 🤯 Model: https://t.co/bYFanHgikH Codebase: https://t.co/M2GLiw3nUl Blog: https://t.co/ka7yjL29HQ Paper: https://t.co/lUF2xwYQWq (1/8)

liliang_ren's tweet photo. Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮
Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput at 32K generation length with vLLM. 🤯

Model: https://t.co/bYFanHgikH
Codebase: https://t.co/M2GLiw3nUl
Blog: https://t.co/ka7yjL29HQ
Paper: https://t.co/lUF2xwYQWq

(1/8)

2

361

68

208

44K

JasonQSY retweeted

Wei-Chiu Ma @weichiuma

12 months ago

Interactable Digital Twins hold great promises. It allows us to train in sim and test in real. But can we go a step further? Can we deploy a robot w/o training? Key idea: simulate the outcome of each action with Digital Twins and use VLM as critic to select the best action.

1

60

6

16

5K

Shengyi Qian @JasonQSY

12 months ago

We’re presenting 3D-MVP at CVPR poster #140 right now! #CVPR2025 #ComputerVision #MachineLearning #DeepLearning #3DVision #Robotics #AI #Research #PaperPresentation

JasonQSY's tweet photo. We’re presenting 3D-MVP at CVPR poster #140 right now!

#CVPR2025 #ComputerVision #MachineLearning #DeepLearning #3DVision #Robotics #AI #Research #PaperPresentation https://t.co/61oKaFBOQ3

1

35

6

2K

JasonQSY retweeted

Voxel51

@Voxel51

about 1 year ago

One of the biggest bottlenecks in deploying visual AI and computer vision is annotation, which can be both costly and time-consuming. Today, we’re introducing Verified Auto Labeling, a new approach to AI-assisted annotation that achieves up to 95% of human-level performance while cutting labeling costs by up to 100,000x and time by 5,000x. Read the full paper: https://t.co/eKc1sALnV3

3

113

194

8

13K

Shengyi Qian @JasonQSY

12 months ago

3️⃣ 3D-GRAND: Towards Better Grounding & Less Hallucination for 3D-LLMs. A large-scale dataset & models for improved 3D visual grounding. Project: https://t.co/PqKH0dmqM1 #3DLLM #AI DM me if you're at #CVPR or want to chat about these! Looking forward to it!

0

3

0

125

Shengyi Qian @JasonQSY

12 months ago

Thrilled to be heading to Nashville next week for #CVPR2025! Can't wait to connect with the community & dive into the latest in computer vision.

1

3

0

182

Shengyi Qian @JasonQSY

12 months ago

2️⃣ Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning. Introducing a new benchmark for #MultimodalGraphLearning. Project: https://t.co/opsw7TkaXm #MachineLearning

1

0

98

Shengyi Qian

@JasonQSY

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users