Jiawei Gu @kuvvius - Twitter Profile

Pinned Tweet

7 months ago

🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 https://t.co/2GPHnsPq7R (1/16)

Kuvvius's tweet photo. 🚨Sensational title alert: we may have cracked the code to true multimodal reasoning.
Meet ThinkMorph — thinking in modalities, not just with them.
And what we found was... unexpected. 👀
Emergent intelligence, strong gains, and …🫣
🧵 https://t.co/2GPHnsPq7R
(1/16) https://t.co/jnTl4CzwsA

27

316

67

253

69K

Kuvvius retweeted

Manling Li

@ManlingLi_

14 days ago

Budget-aware Agents (BAGEN) study the failure modes in budget estimation: 1. Strong agents are not strong budget estimators. 2. Frontier models are often overoptimistic. 3. Budget awareness is actionable and trainable. SFT plus RL strengthens early stop and alert behavior, saving 28-64 percent of tokens on failed trajectories. 4. Upper and lower bound calibration remains hard. https://t.co/RIDpR6g8oP

ManlingLi_'s tweet photo. Budget-aware Agents (BAGEN) study the failure modes in budget estimation:

1. Strong agents are not strong budget estimators.

2. Frontier models are often overoptimistic.

3. Budget awareness is actionable and trainable. SFT plus RL strengthens early stop and alert behavior, saving 28-64 percent of tokens on failed trajectories.

4. Upper and lower bound calibration remains hard.

https://t.co/RIDpR6g8oP

2

87

14

34

15K

Kuvvius retweeted

Yining Hong

@yining_hong

23 days ago

Excited to share ESI-BENCH, a benchmark for Embodied Spatial Intelligence! Most spatial reasoning benchmarks assume an oracle observer: the agent is given the right image, view, or 3D scene. But in the real world, the observer is also an actor. To understand space, agents must decide where to look, how to move, and when to interact, to reveal what is hidden: occlusions, containment, contact, dynamics, and functionality. In many cases, the hard part is not perception itself, but choosing the right action to make informative perception possible. ESI-BENCH tests this perception-action loop. Agents receive an egocentric observation and a spatial question, then must actively gather evidence through perception, locomotion, and manipulationbefore answering. The benchmark spans 10 task categories, 29 subcategories, and 3,081 instances, built in BEHAVIOR-1K across realistic interactive scenes. 🌍Webpage: https://t.co/Ou3zJ48eFx 💻Code & data: https://t.co/Mw0kU5hoyA Thanks for collaborators: Jiageng, Han, @ManlingLi_ , Leonidas Guibas, @drfeifei , @jiajunwu_cs , @YejinChoinka

8

221

46

92

47K

Kuvvius retweeted

Ziwei Liu

@liuziwei7

16 days ago

✨Is Your Spatial Foundation Model an All-Round Player✨ @ropedia_ai presents #SpatialBench, a diverse spatial benchmark over 19 source datasets, 540+ scenes, 40+ model variants, and 6 reconstruction paradigms. - Project: https://t.co/tpfVJiiNmV - Code: https://t.co/FWfoGKU33i

2

96

16

48

10K

Jiawei Gu

@Kuvvius

28 days ago

@YikunWang001 Hahaha thanks!! 😎

0

31

Jiawei Gu

@Kuvvius

29 days ago

A very nice surprise: the #ICML2026 Gold Reviewer Award. 😊 Just happy to do my small part for the community.

1

18

1

963

Kuvvius retweeted

Fanqing Meng

@FanqingMengAI

about 2 months ago

https://t.co/mAk0mbDNBa https://t.co/x1SEMzxzGp tech report release

2

40

8

16

12K

Jiawei Gu

@Kuvvius

about 2 months ago

I couldn’t make it to @iclr_conf in person, but ThinkMorph is there now :) If you’re around, come say hi to our poster for me! 👋 We’re at Pavilion 3, P3-#1724, and it’ll be up through 1:00 PM BRT. #ICLR2026

Kuvvius's tweet photo. I couldn’t make it to @iclr_conf in person, but ThinkMorph is there now :) If you’re around, come say hi to our poster for me! 👋

We’re at Pavilion 3, P3-#1724, and it’ll be up through 1:00 PM BRT.
#ICLR2026 https://t.co/18IAkfFOb2

0

9

2

1

300

Kuvvius retweeted

Zijian Wu

@Jaku_metsu

about 2 months ago

At Kimi, we do care about Notion use. Training K2.6 on remote apps such as notion was one of the most important projects during my internship. A bit of Kimi flavor: we like to RL things that aren't supposed to be RL-able. A lot of it came from RL. And it scales.

2

44

3

8

5K

Kuvvius retweeted

Mahtab Bigverdi @MahtabBg

about 2 months ago

Don’t miss our 8/8/8/6 ICLR 2026 🇧🇷🌴🥁 paper, STARE😳! We introduce a benchmark and analysis revealing key gaps in how multimodal models handle multistep visual simulations. Check it out: https://t.co/EuDwSgfrhy

1

23

4

13

4K

Kuvvius retweeted

Zixian Ma@CVPR

@zixianma02

about 2 months ago

Check out STARE: our new ICLR paper with a (very challenging) visual spatial reasoning benchmark which even sora2 has no clue how to solve👇 video cr. @LINJIEFUN

1

43

10

23

5K

Jiawei Gu

@Kuvvius

about 2 months ago

(12/12🧵) Website: https://t.co/qX0kzlYpEm Code: https://t.co/rq2yMZpPzA Data: https://t.co/QXtGUjSLob 😳STARE is one lens on a frontier that is still wide open. Very lucky to work on this with @LINJIEFUN, @MahtabBg, @zixianma02 , Yinuo Yang, Ziang Li, @YejinChoinka , and @RanjayKrishna.

0

3

0

2

182

Jiawei Gu

@Kuvvius

about 2 months ago

One thing that keeps pulling us back is how effortless spatial reasoning is for people. We look at a cube net once and just know if it folds into a box. We can almost "run" the folds in our head, panel by panel, without really trying. About a year ago, we started noticing something strange: when we gave a model step-by-step visual cues, it didn't get clearer — it got even worse. That's what led us to 😳 STARE at what happens when AI tries to think in space. The paper was recently accepted at #ICLR2026 (8/8/8/6). (1/12) 🧵 https://t.co/SYw0Xszlt7

Kuvvius's tweet photo. One thing that keeps pulling us back is how effortless spatial reasoning is for people.

We look at a cube net once and just know if it folds into a box. We can almost "run" the folds in our head, panel by panel, without really trying.

About a year ago, we started noticing something strange: when we gave a model step-by-step visual cues, it didn't get clearer — it got even worse.

That's what led us to 😳 STARE at what happens when AI tries to think in space. The paper was recently accepted at #ICLR2026 (8/8/8/6).

(1/12) 🧵 https://t.co/SYw0Xszlt7

1

157

23

120

20K

Jiawei Gu

@Kuvvius

about 2 months ago

(11/12🧵) What this tells us: Current models are optimized to think in text. But spatial cognition demands thinking in multimodality. Our related work tackles this: • ThinkMorph (ICLR '26): imagine and mentally simulate transformations https://t.co/B9gzvGPcHK • AdaReasoner (ICLR '26): draw, annotate, compute with tools https://t.co/tp7XfZOlzF So, how do we teach models to see with their mind’s eye?

1

6

2

4

426

Jiawei Gu

@Kuvvius

Last Seen Users on Sotwe

Trends for you

Most Popular Users