Mu Cai @MuCai7 - Twitter Profile

Mu Cai

@MuCai7

5 days ago

@baaadas @openpcma @LumaLabsAI All the best, Jiaming!

0

2

0

183

MuCai7 retweeted

Harris Zhang @HyperStorm9682

10 days ago

🚨 Your Embedding Model is SMARTer Than You Think! Single-vector models actually hide powerful multi-vector capabilities in their frozen hidden states. We introduce SMART, a framework that unlocks this ability for SoTA multimodal retrieval. 🧵👇 🔗 https://t.co/UBpQ2y4sXU

HyperStorm9682's tweet photo. 🚨 Your Embedding Model is SMARTer Than You Think! Single-vector models actually hide powerful multi-vector capabilities in their frozen hidden states. We introduce SMART, a framework that unlocks this ability for SoTA multimodal retrieval. 🧵👇 🔗 https://t.co/UBpQ2y4sXU https://t.co/J899phnS14

1

79

18

57

17K

MuCai7 retweeted

Mira Murati

@miramurati

17 days ago

Collaborative AI runs on interactivity: machines and people, working in real time, across every modality. Solving it takes a community, join us.

73

2K

114

418

248K

Mu Cai

@MuCai7

16 days ago

Wow, always high quality papers from Xueyan and Yuheng, could be a good measure for video generation!

Xueyan Zou

@xyz2maureen

17 days ago

🔥Excited to share the first released work from our IEI lab! Congrats to @AnteaWu 🎉 This work is motivated by the lack of quantitative evaluation for physics alignment in video world models. With tools like MegaSam and CoTracker, we can directly reconstruct dynamic 3D scenes, enabling quantitative evaluation of physical alignment. Both code and data are released — feel free to try it out! It should work, but if it doesn’t, contact @AnteaWu directly : )

1

32

6

19

9K

0

16

1

2

4K

Who to follow

Sean Du

@xuefeng_du

Assistant Professor @NTUsg | Ph.D. @WisconsinCS, fellow @JaneStreetGroup | reliable machine learning 🤖️ ⛑️ | Opinions are my own

Zifeng Wang

@ZifengWang315

Research Scientist @Google, PhD in Machine Learning @Northeastern. Large Language Models, Continual learning, Data & Parameter-efficient learning.

Yuchen Zeng

@yzeng58

Researcher @MSFTResearch, AI Frontiers Lab | Reasoning, Agent | Previously @Meta @MSFT_GSL @MITIBMLab @WisconsinCS

Mu Cai

@MuCai7

16 days ago

Call for high quality realtime video/audio full duplex evals! The whole field needs them! Come submit here!

Thinking Machines

@thinkymachines

17 days ago

We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th! https://t.co/907HfBy7g3

50

2K

197

2K

586K

0

15

0

1

2K

MuCai7 retweeted

Thinking Machines

@thinkymachines

17 days ago

We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th! https://t.co/907HfBy7g3

50

2K

197

2K

586K

Mu Cai

@MuCai7

17 days ago

@atasteoff Big congratulations!

1

2

0

192

MuCai7 retweeted

Rowan Zellers

@rown

24 days ago

We are so back!

37

546

18

60

53K

Mu Cai

@MuCai7

24 days ago

@yong_jae_lee @yu_zhuoran32720 Congrats, Zhuoran!

0

1

0

259

Mu Cai

@MuCai7

24 days ago

My first share since joining @thinkymachines. Fun working with this team on real-time multimodal interaction. Vision in turn-based models felt like flipping through photos — continuous video is a different problem. Visual proactivity is essential — grateful to have worked on this alongside @liliyu_lili, @rown , and the rest of the team!

Thinking Machines

@thinkymachines

24 days ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

461

16K

2K

12K

8M

6

158

6

15

11K

MuCai7 retweeted

Thinking Machines

@thinkymachines

24 days ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

461

16K

2K

12K

8M

Mu Cai

@MuCai7

29 days ago

@Mononofu @elonmusk @SpaceX Congrats Julian!

0

2

0

121

Mu Cai

@MuCai7

about 1 month ago

@Yihe__Deng All the best!

0

2

0

280

MuCai7 retweeted

Logan Kilpatrick

@OfficialLoganK

2 months ago

Introducing Gemma 4, our series of open weight (Apache 2.0 licensed) models, which are byte for byte the most capable open models in the world! Gemma 4 is build to run on your hardware: phones, laptops, and desktops. Frontier intelligence with a 26B MOE and a 31B Dense model!

OfficialLoganK's tweet photo. Introducing Gemma 4, our series of open weight (Apache 2.0 licensed) models, which are byte for byte the most capable open models in the world!

Gemma 4 is build to run on your hardware: phones, laptops, and desktops.

Frontier intelligence with a 26B MOE and a 31B Dense model! https://t.co/PVtYRnKQW0

287

6K

592

1K

525K

Mu Cai

@MuCai7

2 months ago

@CatGodSandHive Exactly! And this is why we think computer vision community has ignored this important direction: multiscale upon pixel space!

0

1

0

46

Mu Cai

@MuCai7

2 months ago

🤯 Upgrade your pretrained visual encoder with <10 lines of code. This is what vision researchers have ignored: Can you imagine multiscale upon pixel space can work so well?! Remember, we are not doing multiscale upon feature space! 🏠Project Page: https://t.co/LLXO2Z39lt 📷 Paper: https://t.co/HP058lSQn6 Get uniform improvements upon MLLM, Seg, Depth with similar computation cost.

Bocheng Zou @bochengzou

2 months ago

🔥 Upgrade your frozen vision encoders with <10 lines of code! Single-scale inference throws away vital details. Enter MuRF 🚀: a simple, training-free plug-in for instant, massive gains in MLLMs, Seg & Depth. 🤯 1/6

bochengzou's tweet photo. 🔥 Upgrade your frozen vision encoders with <10 lines of code!
Single-scale inference throws away vital details. Enter MuRF 🚀: a simple, training-free plug-in for instant, massive gains in MLLMs, Seg & Depth. 🤯 1/6 https://t.co/bOvAdhAn2h

7

148

26

145

28K

4

157

30

133

19K

Mu Cai

@MuCai7

2 months ago

Good question, we have efficiency analysis in the paper! And it is straight forward: For MLLM: MuRF holds the same number of tokens as as single scale due to its design, leading to the same computation cost in LLM part. Empirically, we observed that MuRF achieves similar VRAM usuage, training and inference time compared to the single resolution for MLLM. The whole thing happens since visual encoder is much smaller than LLM!

0

2

0

96

Mu Cai

@MuCai7

2 months ago

Hi Thomas, thanks for the comment! Huge fan of S² and learned upsamplers like AnyUp! 🤝 While we share the goal of multi-scale representation, MuRF takes a fundamentally different path. TL;DR: We show that simply resizing the whole image (no tiling!) and fusing features creates a universally stronger representation without any learned upsampling heuristics. Here is the deeper dive into why we are different: 1️⃣ Motivation & Token Budget: We asked: Does higher resolution always mean better features? Surprisingly, no! Low-res provides crucial global context that actually improves high-res performance. For MLLMs, we lift the performance ceiling by a large margin while keeping the exact same number of visual tokens! 2️⃣ Approach (No Tiling, No Bells & Whistles): Unlike S², which cuts images into independent patches (breaking spatial layout and object continuity), we process the entire image at different scales. No complex layout engineering. As for AnyUp, learned upsamplers are great, but our parameter-free bilinear upsampling requires zero training. This guarantees extreme simplicity, maximum flexibility, and prevents generalizability issues. 3️⃣ Universal Application: We aren't just optimizing MLLM token budgets. MuRF is a fundamental, training-free enhancement for visual representations—generalizing flawlessly out-of-the-box across high-level reasoning (MLLMs), dense geometry (Seg/Depth), and even unsupervised anomaly detection. We believe this simple, holistic multi-scale synergy is a highly promising direction. Let's push toward better visual representations together! 🚀

0

88

Mu Cai

@MuCai7

2 months ago

Huge congrats to @bochengzou, who began working on this two years ago and made this magical technique happen!

0

4

0

618

MuCai7 retweeted

Bocheng Zou @bochengzou

2 months ago

🔥 Upgrade your frozen vision encoders with <10 lines of code! Single-scale inference throws away vital details. Enter MuRF 🚀: a simple, training-free plug-in for instant, massive gains in MLLMs, Seg & Depth. 🤯 1/6

7

148

26

145

28K

Mu Cai

@MuCai7

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users