Anurag Bagchi @miccooper9 - Twitter Profile

Pinned Tweet

5 months ago

[1/6] Ego-centric World Models We introduce EgoWM — a video world model that simulates EVE-1X humanoid interactions from a single ego-view image + full-body joint angle trajectories. Moreover it effortlessly generalizes to extreme OOD domains, including paintings !

12

417

45

261

43K

Miccooper9 retweeted

Aryan Satpathy

@satpathyaryan45

about 2 months ago

Excited to share our project - Sim2Reason! Key Insight: Simulators are an untapped source of cheap supervision for scientific reasoning. LLMs can learn physical reasoning from simulation to improve on real world benchmarks such as the International Physics Olympiad!

0

19

6

8

3K

Miccooper9 retweeted

Vincent Sitzmann

@vincesitzmann

4 months ago

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. https://t.co/aFmE9CHHau

44

1K

165

791

392K

Anurag Bagchi @Miccooper9

5 months ago

@zhihelu1 Thanks! for world models, more precisely, forward dynamics models (current state + action -> future state), this is standard formulation. There are lots of model-based control approaches that can be used to plan/predict actions using such world models.

0

2

0

1

384

Who to follow

Davide Moltisanti

@davmoltisanti

Lecturer (assistant professor) at the University of Bath.

Vivek Gopalan

@vvkgopalan

https://t.co/0EUVCkrpQI investing in the best @8VC. formerly wrote code @Yugabyte and met great people @Yale.

Prantik Deb

@prantikDebAI

MS by Research @iiit_hyderabad • Lifelong Student of Science ∩ AI

Anurag Bagchi @Miccooper9

5 months ago

[1/6] Ego-centric World Models We introduce EgoWM — a video world model that simulates EVE-1X humanoid interactions from a single ego-view image + full-body joint angle trajectories. Moreover it effortlessly generalizes to extreme OOD domains, including paintings !

12

417

45

261

43K

Anurag Bagchi @Miccooper9

5 months ago

[6/6] Fine-grained humanoid manipulation EgoWM enables precise 25-DoF joint-angle manipulation with the EVE-1X humanoid, even at 4× temporal compression (Cosmos-2B). 📷 Learn more: Project page: https://t.co/GgyMU0sF9v Paper: https://t.co/2LNOIi6MCs

0

18

1

7

2K

Anurag Bagchi @Miccooper9

5 months ago

[5/6] Temporal compression Unlike prior works, we preserve full-sequence diffusion and compress actions to the latent temporal resolution. EgoWM achieves +42% better action alignment at +4s horizon vs. frame-wise autoregressive NWMs even with 4× temporal compression (Cosmos-2B).

1

13

0

1

1K

Miccooper9 retweeted

Shraman Pramanick

@Shramanpramani2

8 months ago

My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on the job market starting immediately. #metalayoffs #FAIR #MSL #SAM

26

336

26

72

110K

Anurag Bagchi @Miccooper9

8 months ago

Happening now! ICCV 25 poster#323 Drop by to chat and see some cool results!

Anurag Bagchi @Miccooper9

8 months ago

[ICCV 25] Refer Everything Model (REM) (1/6) We leverage Text-to-Video Generation models to zero-shot segment any concept in a video using text. REM generalises to dynamic concepts like smoke, light-beam and more without ever having seen segmentation masks for these entities.

1

92

12

69

10K

0

1

0

258

Anurag Bagchi @Miccooper9

8 months ago

@andrew_n_carr Thanks Andrew! We were also really surprised to see how well this worked. Exciting times ahead.

0

1

0

35

Anurag Bagchi @Miccooper9

8 months ago

[ICCV 25] Refer Everything Model (REM) (1/6) We leverage Text-to-Video Generation models to zero-shot segment any concept in a video using text. REM generalises to dynamic concepts like smoke, light-beam and more without ever having seen segmentation masks for these entities.

1

92

12

69

10K

Anurag Bagchi @Miccooper9

8 months ago

(6/6) We’re at the start of the internet-scale "video" era, and the possibilities are exciting. Learn more at https://t.co/ERyk8Cfpst — our code & model weights are available. Visiting ICCV? Come see our poster on Oct 23 to chat and see results in action!

1

4

1

366

Anurag Bagchi @Miccooper9

8 months ago

(5/6) REM demonstrates how Text-to-Video generation can serve as a powerful pre-training paradigm for downstream video understanding. The days of large-scale, labor-intensive video annotation may soon be behind us — pre-train to generate, fine-tune lightly to understand.

1

3

0

385

Anurag Bagchi

@Miccooper9

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users