Pavankumar Vasu @PavankumarVasu - Twitter Profile

Pinned Tweet

about 1 year ago

Excited to share code & models for FastVLM — our blazing-fast Vision-Language Model appearing at #CVPR2025 Run it on-device with inference code optimized for Apple Silicon using #mlx. Code: https://t.co/zrYytwr9N1 Updated paper & results coming soon. Stay tuned! 👀

11

208

49

140

50K

PavankumarVasu retweeted

Jiatao Gu

@thoma_gu

6 months ago

(1/n) There’s a long-running debate on bringing representation learning into generative modeling—their latent spaces play different roles. 🚀🚀 We present FAE, a simple-yet-effective framework that bridges them with a single attention layer! Paper: https://t.co/p8eLoGwDBk

thoma_gu's tweet photo. (1/n) There’s a long-running debate on bringing representation learning into generative modeling—their latent spaces play different roles.

🚀🚀 We present FAE, a simple-yet-effective framework that bridges them with a single attention layer!

Paper: https://t.co/p8eLoGwDBk

6

509

88

363

87K

PavankumarVasu retweeted

Yizhe Zhang @YizheZhangNLP

7 months ago

We use latent continuous thoughts for retrieval optimized via downstream NTP loss, unified under one LLM backbone. Since representations are shared, documents can be precomputed—eliminating 2-stage RAG. We match raw text performance but with a much shorter context budget. 📉🚀

YizheZhangNLP's tweet photo. We use latent continuous thoughts for retrieval optimized via downstream NTP loss, unified under one LLM backbone. Since representations are shared, documents can be precomputed—eliminating 2-stage RAG. We match raw text performance but with a much shorter context budget. 📉🚀 https://t.co/SynhzoBxP0

1

35

8

7

7K

PavankumarVasu retweeted

Jiatao Gu

@thoma_gu

7 months ago

STARFlow gets an upgrade—it now works on videos🎥 We present STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows, a invertible, causal video generator built on autoregressive flows! 📄 Paper https://t.co/NIMAUlpuNw 💻 Code https://t.co/3sgI5dfD3W (1/10)

5

207

41

106

72K

Who to follow

Manas Krishnakant

@manaskrshnakant

Renaissance Man. Civil servant. @pibhyderabad https://t.co/ClcjlifrjN

Joel Louzado (2/10 videos re:product)

@jlouzado

Product person @repurposeglobal, ex-@Dream11. Born at 353.2ppm CO₂. Tweets are me trying to steer through the Scylla / Charybdis of Capitalism / Sustainability

Rahul

@selfawareatom

Founding member and leading the foundation models team @sarvamai.

PavankumarVasu retweeted

Eran Malach @EranMalach

8 months ago

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: https://t.co/bCzxawF452 🧵

EranMalach's tweet photo. SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them.
Arxiv: https://t.co/bCzxawF452
🧵 https://t.co/1Vcz4t01NL

6

416

82

280

115K

PavankumarVasu retweeted

Fartash Faghri

@FartashFg

8 months ago

🚨While booking your travel for #NeurIPS2025, make sure to stay on Sunday, December 7 8am-5pm for CCFM Workshop (Continual and Compatible Foundation Model Updates). We have received exciting paper contributions and have an amazing lineup of speakers.

0

21

3

4

4K

PavankumarVasu retweeted

Xianhang Li

@XianhangLi

8 months ago

🤔 Ever thought a small teacher could train a student 6× larger that sets new SOTA in training efficiency and frozen evaluation performance for video representation learning? 🤔 Do we really need complex EMA-based self-distillation to prevent collapse, bringing unstable loss dynamics while offering little insight into representation quality? 🚨 In our new paper, we investigate these questions and propose SALT (Static-teacher Asymmetric Latent Training): a simple, scalable, and compute-efficient alternative for video representation learning. 📄 Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers 🔗 https://t.co/C9amVFddSH

XianhangLi's tweet photo. 🤔 Ever thought a small teacher could train a student 6× larger that sets new SOTA in training efficiency and frozen evaluation performance for video representation learning?

🤔 Do we really need complex EMA-based self-distillation to prevent collapse, bringing unstable loss dynamics while offering little insight into representation quality?

🚨 In our new paper, we investigate these questions and propose SALT (Static-teacher Asymmetric Latent Training): a simple, scalable, and compute-efficient alternative for video representation learning.

📄 Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers
🔗 https://t.co/C9amVFddSH

7

454

69

277

39K

Pavankumar Vasu @PavankumarVasu

9 months ago

📢 FastVLM models are now on 🤗

Xenova

@xenovacom

9 months ago

NEW: Apple releases FastVLM and MobileCLIP2 on Hugging Face! 🤗 The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time VLM applications! 🤯 It can even do live video captioning 100% locally in your browser (zero install). Huge for accessibility!

35

2K

213

1K

222K

0

6

0

256

Pavankumar Vasu @PavankumarVasu

9 months ago

📢 Releasing MobileCLIP2 (TMLR Featured). Small embedding models that can power your multimodal RAG applications on resource constrained devices. Models are available on 🤗

Fartash Faghri

@FartashFg

9 months ago

🚀Releasing MobileCLIP2 (TMLR Featured). MobileCLIP2-S4 matches acc of SigLIP-SO400M/14 while 2x smaller and surpasses DFN ViT-L/14 at 2.5x faster. Paper: https://t.co/pPnuakmR9Q Code: https://t.co/uaxPYe0xhf RayGen: https://t.co/K8MM1364gD 🤗https://t.co/Mw60qIYYGx #Apple MLR

FartashFg's tweet photo. 🚀Releasing MobileCLIP2 (TMLR Featured). MobileCLIP2-S4 matches acc of SigLIP-SO400M/14 while 2x smaller and surpasses DFN ViT-L/14 at 2.5x faster.
Paper: https://t.co/pPnuakmR9Q
Code: https://t.co/uaxPYe0xhf
RayGen: https://t.co/K8MM1364gD
🤗https://t.co/Mw60qIYYGx
#Apple MLR https://t.co/IdmRtEEBA8

5

73

27

25

7K

0

2

0

244

PavankumarVasu retweeted

Fartash Faghri

@FartashFg

9 months ago

🚀Releasing MobileCLIP2 (TMLR Featured). MobileCLIP2-S4 matches acc of SigLIP-SO400M/14 while 2x smaller and surpasses DFN ViT-L/14 at 2.5x faster. Paper: https://t.co/pPnuakmR9Q Code: https://t.co/uaxPYe0xhf RayGen: https://t.co/K8MM1364gD 🤗https://t.co/Mw60qIYYGx #Apple MLR

5

73

27

25

7K

PavankumarVasu retweeted

Fartash Faghri

@FartashFg

10 months ago

🚨📅The submission deadline for #NeurIPS 2025 CCFM Workshop is just 8 days away on August 22. Get your papers in! Submit your work on Continual and Compatible Foundation Model Updates to the #NeurIPS 2025 CCFM Workshop. Learn more: https://t.co/oIrrtiRKD6

0

5

1

0

2K

PavankumarVasu retweeted

Max Seitzer @maxseitzer

10 months ago

Introducing DINOv3 🦕🦕🦕 A SotA-enabling vision foundation model, trained with pure self-supervised learning (SSL) at scale. High quality dense features, combining unprecedented semantic and geometric scene understanding. Three reasons why this matters…

maxseitzer's tweet photo. Introducing DINOv3 🦕🦕🦕

A SotA-enabling vision foundation model, trained with pure self-supervised learning (SSL) at scale.
High quality dense features, combining unprecedented semantic and geometric scene understanding.

Three reasons why this matters… https://t.co/kOajLhcBi9

12

1K

138

417

135K

PavankumarVasu retweeted

Andi Marafioti

@andimarafioti

10 months ago

🚀 We're thrilled to launch four new OCR datasets with 20M images: DoclingMatix, SynthFormulaNet, SynthCodeNet, and SynthChartNet. We used them train SmolDocling, our ultra‑compact (256M) full-page document conversion VLM with performance rivaling models up to 27× larger.

andimarafioti's tweet photo. 🚀 We're thrilled to launch four new OCR datasets with 20M images: DoclingMatix, SynthFormulaNet, SynthCodeNet, and SynthChartNet. We used them train SmolDocling, our ultra‑compact (256M) full-page document conversion VLM with performance rivaling models up to 27× larger. https://t.co/ZtxVnQ2jSM

5

547

77

371

30K

PavankumarVasu retweeted

Andrea Santilli @teelinsan

11 months ago

Uncertainty quantification (UQ) is key for safe, reliable LLMs... but are we evaluating it correctly? 🚨 Our ACL2025 paper finds a hidden flaw: if both UQ methods and correctness metrics are biased by the same factor (e.g., response length), evaluations get systematically skewed

teelinsan's tweet photo. Uncertainty quantification (UQ) is key for safe, reliable LLMs... but are we evaluating it correctly?

🚨 Our ACL2025 paper finds a hidden flaw: if both UQ methods and correctness metrics are biased by the same factor (e.g., response length), evaluations get systematically skewed https://t.co/wi59eGDLP7

1

48

17

12

4K

PavankumarVasu retweeted

Hadi Pouransari @HPouransari

11 months ago

🌟Explore key insights from the FastVLM project (real-time vision-language model) in this blog post: https://t.co/bC6Rs9ev3f

4

211

38

122

40K

PavankumarVasu retweeted

Fartash Faghri

@FartashFg

11 months ago

📢Submissions are now open for #NeurIPS2025 CCFM workshop. Submission deadline: August 22, 2025, AoE. Website: https://t.co/oIrrtiRKD6 Call for papers: https://t.co/9sUoMl7AJg Submission Link: https://t.co/2aXHQaqFDf

0

10

6

2

11K

PavankumarVasu retweeted

Mustafa Shukor @MustafaShukor1

11 months ago

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵

MustafaShukor1's tweet photo. We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders !

Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵 https://t.co/ISSAo9Ymp2

6

264

45

214

31K

PavankumarVasu retweeted

Rin Metcalf Susa @RinMetcalfSusa

11 months ago

📣 We are excited to present our work on inferring user preferences from writing samples at @icmlconf Poster Session 3 (Wed. 11:00AM - 1:30PM)! Come by to ✋ chat with us, 📄 learn about our method, and 💻 hear about our new interactive benchmark (🔗s below)!

1

7

3

0

490

PavankumarVasu retweeted

Fartash Faghri

@FartashFg

11 months ago

🚀Super excited to share TiC-LM (Oral at #ACL2025)! How to keep FMs up-to-date over months/years? We have a benchmark and lots of insights (https://t.co/Dm4n4xT0Ul). Also organizing a related @NeurIPSConf 2025 workshop continual and compatible FMs (CCFM: https://t.co/Dly5OXTfOc) Code/Models/Dataset: https://t.co/B2nJ8LSIrX Our prior work on TiC-CLIP: https://t.co/QkOGeqHtWS Thanks to @jeffwpli for his amazing work on DCLM and TiC-LM and other upcoming works during his internship at @Apple MLR. Thanks to everyone at @Apple MLR to help us do great research.

0

11

2

0

865

PavankumarVasu retweeted

Jiatao Gu

@thoma_gu

12 months ago

I will be attending #CVPR2025 and presenting our latest research at Apple MLR! Specifically, I will present our highlight poster--world consistent video diffusion (https://t.co/ms3o8L1R9B), and three workshop invited talks which includes our recent preprint ★STARFlow★! (0/n)

2

85

23

19

29K

PavankumarVasu retweeted

Ryan Hoque @ryan_hoque

about 1 year ago

Imitation learning has a data scarcity problem. Introducing EgoDex from Apple, the largest and most diverse dataset of dexterous human manipulation to date — 829 hours of egocentric video + paired 3D hand poses across 194 tasks. Now on arxiv: https://t.co/bJBPER8GTC (1/4)

15

606

91

378

114K

Pavankumar Vasu

@PavankumarVasu

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users