Arijit Ray @ CVPR2026 @array693 - Twitter Profil

Tweet Sematan

Arijit Ray @ CVPR2026

4 bulan yang lalu

"It is by logic that we prove, but by [abstract] intuition that we discover." - Henri Poincaré. When faced with a complex problem, we pause, we think. Not exactly in words, not exactly in images — in something more abstract, something harder to name. So, for truly intelligent agents, should we not ask that they do the same? Introducing Mull-Tokens — a modality-agnostic latent thinking paradigm. Now, the model can think in space, in time, in words, in affordances — in all the things that language alone cannot easily convey. https://t.co/4A4l7vED9d

1

8

2

731

Arijit Ray @ CVPR2026

@ARRay693

23 hari yang lalu

@DJiafei We will have come full circle as a society if we go back to riding robotic horses instead of cars. 😅

0

32

ARRay693 di-retweet

Jiafei Duan

@DJiafei

sekitar 1 bulan yang lalu

Launching my research group, MAGIC (Manipulation and General Intelligence Control) Lab @NUSComputing, Singapore! We focus on building the next generation of human-centric models for robotic manipulation — deployable safely, reliably, and easily in the real world. Our research spans MLLM reasoning, 3D vision, robot learning, simulation, dexterous manipulation, and cross-embodiment learning. Interested in joining? Sign up here and I'll send a reminder email: https://t.co/9lEQnFERuh

9

340

29

130

26K

ARRay693 di-retweet

Ai2 @allen_ai

3 bulan yang lalu

Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf. Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵

allen_ai's tweet photo. Today we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf.

Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵 https://t.co/ivUIcQDXtm

21

806

114

536

131K

Siapa yang Harus Diikuti

Afra Feyza Akyürek

@afeyzaakyurek

Currently @scale_AI. PhD from @BUCompSci. Research in NLP. Previously @CMU_Stats @kocuniversity @izmirfenlise

Zilu Tang (Peter)

@Zilu_Tang_Peter

Boston University NLP PhD @llamagrp. Researching interpretability and generalization of language models. Made in China. (Biased) Opinions are my own.

Shuvom Sadhuka

@shuvom_s

phd-ing @mit_csail @miteecs. @hertzfoundation fellow. curr: research intern @abridgehq

Arijit Ray @ CVPR2026

@ARRay693

3 bulan yang lalu

@ehsanik @Vercept_ai @AnthropicAI Amazing, congratulations!!

0

1

0

64

Arijit Ray @ CVPR2026

@ARRay693

4 bulan yang lalu

This work would not be possible without all my amazing collaborators: Ahmed Abdelkader, Chengzhi Mao, Bryan Plummer, @kate_saenko_ , @RanjayKrishna , Leonidas Guibas, and Vincent Chu!

0

2

0

120

Arijit Ray @ CVPR2026

@ARRay693

4 bulan yang lalu

"It is by logic that we prove, but by [abstract] intuition that we discover." - Henri Poincaré. When faced with a complex problem, we pause, we think. Not exactly in words, not exactly in images — in something more abstract, something harder to name. So, for truly intelligent agents, should we not ask that they do the same? Introducing Mull-Tokens — a modality-agnostic latent thinking paradigm. Now, the model can think in space, in time, in words, in affordances — in all the things that language alone cannot easily convey. https://t.co/4A4l7vED9d

1

8

2

731

Arijit Ray @ CVPR2026

@ARRay693

4 bulan yang lalu

As conversations continue around grounding visual & textual reasoning, we believe latent, modality-agnostic thinking could be a promising direction. The latents can be extended to anything - trajectories, 3D point-cloud features, audio! Paper, code, and models posted. Dive in and let us know what you build! 🚀

1

0

127

ARRay693 di-retweet

Kate Saenko @kate_saenko_

7 bulan yang lalu

🚀 Excited to share that my team at Meta just launched Segment Anything 3! SAM 3 doubles the performance of existing models on open-vocabulary instance segmentation on our new SA-Co benchmark, with 207K unique object labels. Huge congrats to the team, so proud of this work!

4

92

10

13

9K

Arijit Ray @ CVPR2026

@ARRay693

7 bulan yang lalu

@DJiafei Amazing to work with and has impeccable Twitter game. Hire him!

0

1

0

212

Arijit Ray @ CVPR2026

@ARRay693

7 bulan yang lalu

Game your benchmark first (and de-bias it) before others do!

Ellis Brown

@_ellisbrown

7 bulan yang lalu

🌶️ hot take 🌶️ > we should normalize training on the test set yes, you read that right. no, I'm not joking. and, yes... I have taken ML 101 👉 here's why this is crucial for future multimodal LLM research [1/n] 🧵

8

216

21

182

81K

0

1

0

204

Arijit Ray @ CVPR2026

@ARRay693

7 bulan yang lalu

SIMS-V offers free (simulated) rich accurate video annotations for object relationships, distances, and temporal tracking—capabilities often lacking in existing video training datasets. 🎞️💫 Mix it into your data and boost your model's performance on video reasoning tasks! Code and data are open! https://t.co/gU2ODaoJhU

Ellis Brown

@_ellisbrown

7 bulan yang lalu

MLLMs are great at understanding videos, but struggle with spatial reasoning—like estimating distances or tracking objects across time. the bottleneck? getting precise 3D spatial annotations on real videos is expensive and error-prone. introducing SIMS-V 🤖 [1/n]

3

232

47

158

16K

0

4

2

1

392

ARRay693 di-retweet

Ellis Brown

@_ellisbrown

7 bulan yang lalu

MLLMs are great at understanding videos, but struggle with spatial reasoning—like estimating distances or tracking objects across time. the bottleneck? getting precise 3D spatial annotations on real videos is expensive and error-prone. introducing SIMS-V 🤖 [1/n]

3

232

47

158

16K

Arijit Ray @ CVPR2026

@ARRay693

Siapa yang Harus Diikuti

Pengguna Terakhir Terlihat di Sotwe

Tren untuk Anda

Pengguna Paling Populer