Mark Endo @mark_endo1 - Twitter Profile

Pinned Tweet

7 months ago

Thinking about using small multimodal models? Want a clearer understanding of what breaks when downscaling model size, and why? ✨Introducing our new work on Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models 🧵👇

mark_endo1's tweet photo. Thinking about using small multimodal models? Want a clearer understanding of what breaks when downscaling model size, and why?

✨Introducing our new work on Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models
🧵👇 https://t.co/Xov3rUabsx

1

40

8

19

8K

mark_endo1 retweeted

Shiye Su

@shiye_su

5 days ago

Generative models usually turn noise → data. But a lot of science needs unpaired data → data: untreated cells → post-intervention cells, low-redshift galaxies → high. Flow matching can do this in principle — but its quality degrades sharply in high dimensions. The fix? Add more noise. Stochastic Perturbations Improve Distribution-to-Distribution Generative Models 📍 #CVPR2026

2

165

24

142

16K

mark_endo1 retweeted

Fei-Fei Li

@drfeifei

7 days ago

https://t.co/Kt50ttQRMJ

159

5K

939

6K

965K

Mark Endo @mark_endo1

8 days ago

I’m at #CVPR2026 this week🌄 where I’ll be presenting our recent work on Downscaling Intelligence (https://t.co/L4Arg3vSBd): 🗓️Fri, June 5th 10:45am-12:45pm 📍ExHall A-F 73 Happy to chat at CVPR, come say hi at the poster session or feel free to DM!

Mark Endo @mark_endo1

7 months ago

Thinking about using small multimodal models? Want a clearer understanding of what breaks when downscaling model size, and why? ✨Introducing our new work on Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models 🧵👇

1

40

8

19

8K

0

7

0

205

Who to follow

𝐓𝐚𝐫𝐚 𝐑𝐚𝐣𝐞𝐧𝐝𝐫𝐚𝐧, 𝐌𝐁𝐁𝐒, Ph.D,𝐌𝐅𝐀

@TaraRajendran

Rayan Krishnan

@RayanKrishnan

ceo @ValsAI | solve evals, solve intelligence prev @stanford @PalantirTech

Saahil Jain

@saahil9jain

@yousearchengine. Previously @stanford, @microsoft, @columbia.

mark_endo1 retweeted

Chen Geng

@gengchen01

9 days ago

🌟Your static 3D world models are now alive and interactable! 🚀Introducing NeuROK, a neural simulation framework that turns any static 3D object into an interactive 4D asset — no per-category physics, no physical annotations for training. 📄 https://t.co/PSAILjHmZb 🧵 1/n

8

375

73

233

31K

mark_endo1 retweeted

Keshigeyan Chandrasegaran

@keshigeyan

13 days ago

1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation! 🚀100M VLM-captioned image-text pairs for training 📊1M image-text pairs for benchmarking 🖼️~28 trillion pixels 🤗Centrally Hosted ✅Fully permissive for research + commercial use Dataset, benchmark and models🧵👇 Co-led with @KyleSargentAI

keshigeyan's tweet photo. 1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation!

🚀100M VLM-captioned image-text pairs for training
📊1M image-text pairs for benchmarking
🖼️~28 trillion pixels
🤗Centrally Hosted
✅Fully permissive for research + commercial use

Dataset, benchmark and models🧵👇

Co-led with @KyleSargentAI

15

370

84

231

143K

Mark Endo @mark_endo1

23 days ago

Happy to be on the list of #CVPR2026 Outstanding Reviewers ☺️

#CVPR2026 @CVPR

23 days ago

We are grateful to all of the 17,491 reviewers who helped make #CVPR2026 possible. We are especially pleased to recognize the following Outstanding Reviewers, whose high-quality reviews (as judged by their Area Chairs) placed them among the top 5% of reviewers.

CVPR's tweet photo. We are grateful to all of the 17,491 reviewers who helped make #CVPR2026 possible. We are especially pleased to recognize the following Outstanding Reviewers, whose high-quality reviews (as judged by their Area Chairs) placed them among the top 5% of reviewers. https://t.co/YjQppx6a8K

5

223

43

30

96K

1

16

0

1K

mark_endo1 retweeted

Fei-Fei Li

@drfeifei

2 months ago

It’s 11th year and counting! Teaching the first lecture of @cs231n every year has been a highlight of my spring seasons. As usual, I asked students which departments or schools they come from @Stanford . Increasingly, students raise their hands to indicate that they come from all seven schools on campus, from @StanfordEng to @StanfordMed @StanfordHumSci @StanfordGSB @StanfordLaw @StanfordEd @stanforddoerr . AI is truly a horizontal technology that excites students across all backgrounds and disciplines!🤩

drfeifei's tweet photo. It’s 11th year and counting! Teaching the first lecture of @cs231n every year has been a highlight of my spring seasons. As usual, I asked students which departments or schools they come from @Stanford . Increasingly, students raise their hands to indicate that they come from all seven schools on campus, from @StanfordEng to @StanfordMed @StanfordHumSci @StanfordGSB @StanfordLaw @StanfordEd @stanforddoerr . AI is truly a horizontal technology that excites students across all backgrounds and disciplines!🤩

50

1K

97

189

86K

mark_endo1 retweeted

Leon Chen

@realleonlc

4 months ago

🚀 Introducing our fresh work at Stanford and Meta MSL: UniT — Unified Multimodal Chain-of-Thought Test-time Scaling What if a single model could generate an image, look at it, think about what's wrong, and fix it — all by itself? That's exactly what UniT does. 🧵👇

realleonlc's tweet photo. 🚀 Introducing our fresh work at Stanford and Meta MSL:

UniT — Unified Multimodal Chain-of-Thought Test-time Scaling

What if a single model could generate an image, look at it, think about what's wrong, and fix it — all by itself?

That's exactly what UniT does. 🧵👇 https://t.co/MNtiRMZGuj

4

167

31

98

40K

mark_endo1 retweeted

James Burgess @jmhb0

4 months ago

Check out PaperSearchQA, which I'll present at EACL in Morocco this March! We built an RL training environment for teaching LLMs to search and reason over scientific papers. 60k question-answer pairs + 16M papers to search over + benchmarks. RL training improves the model.

jmhb0's tweet photo. Check out PaperSearchQA, which I'll present at EACL in Morocco this March! We built an RL training environment for teaching LLMs to search and reason over scientific papers. 60k question-answer pairs + 16M papers to search over + benchmarks. RL training improves the model. https://t.co/slwF0D9fgU

6

35

14

15

8K

Mark Endo @mark_endo1

6 months ago

Thanks @TheTuringPost for showcasing our work!! Excited for the future of small models 🚀

Turing Post

@TheTuringPost

6 months ago

.@Stanford researchers showed what happens when you shrink a multimodal model. They look specifically at how reducing the size of the LLM inside a multimodal model affects the model’s overall abilities. ➡️ The part that suffers most is vision. And perception really collapses. ▪️ The solution is EXTRACT+THINK – a two–stage pipeline with: - A tiny VLM performing visual extraction tuning - An LLM reasoning over that Here are the details:

TheTuringPost's tweet photo. .@Stanford researchers showed what happens when you shrink a multimodal model.

They look specifically at how reducing the size of the LLM inside a multimodal model affects the model’s overall abilities.

➡️ The part that suffers most is vision. And perception really collapses.

▪️ The solution is EXTRACT+THINK – a two–stage pipeline with:

- A tiny VLM performing visual extraction tuning
- An LLM reasoning over that

Here are the details:

5

91

17

60

8K

0

5

0

429

Mark Endo @mark_endo1

7 months ago

Work done together with the fantastic @yeung_levy at @StanfordAILab! Read our paper here: https://t.co/Xb2MBLtdSl Project website: https://t.co/L4Arg3vSBd Code: https://t.co/ruxhA7HwaN

1

2

0

2

230

Mark Endo @mark_endo1

7 months ago

Thinking about using small multimodal models? Want a clearer understanding of what breaks when downscaling model size, and why? ✨Introducing our new work on Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models 🧵👇

1

40

8

19

8K

Mark Endo @mark_endo1

7 months ago

Our final two-stage approach, Extract+Think, demonstrates extreme parameter and data efficiency, improving over LLaVA-OneVision while using 95% fewer visual training samples.

mark_endo1's tweet photo. Our final two-stage approach, Extract+Think, demonstrates extreme parameter and data efficiency, improving over LLaVA-OneVision while using 95% fewer visual training samples. https://t.co/2Noo1mJmmA

1

0

233

mark_endo1 retweeted

Josiah Aklilu @AkliluJosiah2

about 1 year ago

There’s growing excitement around VLMs and their potential to transform surgery🏥—but where exactly are we on the path to AI-assisted surgical procedures? In our latest work, we systematically evaluated leading VLMs across major surgical tasks where AI is gaining traction..🧵

2

30

6

7

9K

mark_endo1 retweeted

James Burgess @jmhb0

about 1 year ago

🚨Large video-language models LLaVA-Video can do single-video tasks. But can they compare videos? Imagine you’re learning a sports skill like kicking: can an AI tell how your kick differs from an expert video? 🚀 Introducing "Video Action Differencing" (VidDiff), ICLR 2025 🧵

7

58

47

22

11K

Mark Endo

@mark_endo1

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users