Vimal Thilak🦉🐒 @AggieInCA - Twitter Profile

Pinned Tweet

2 months ago

Are EMA teachers commonly used in video SSL (👀V-JEPA family) actually necessary? Hot take: your EMA teacher is a waste of compute Introducing SALT. ICLR 2026 🇧🇷 | Led by @XianhangLi at Apple MLR.🧵

8

88

9

82

13K

Vimal Thilak🦉🐒

@AggieInCA

about 6 hours ago

Good grief 😂. Hunter is America’s Team. Not that self-rated one star team from Dallas 😂

Hunter Biden

@HunterBiden

about 7 hours ago

Things most Americans agree on: Groceries cost too much. Tariffs suck and make no sense. Congress and Presidents shouldn’t trade stocks. The debt is a mess. The border should be secure, but legal immigration is good. Endless wars are stupid, especially ones that nobody wants and have never been explained. Americans are exhausted. AI is like my new best friend that also might be trying to take my job, my ability to think for myself, and my humanity in the process. Yo like I love you, but WTF, but I still love you. Diversity is actually awesome! The opposite is boring AF. Canadians are super fucking cool. Mexicans are chill. Putin isn’t a good guy looking out for America’s best interest. Rocky IV and Miracle are great movies. Good neighbors are a blessing. Freedom of religion and coexistence without having to blow each other up is probably a good idea. We all question, are we alone in the universe? We all fuck up along the way. Epstein didn’t hang himself. The Trumps and Epstein were best friends for decades. It’s like Bert trying to tell us Ernie was just an acquaintance in the same social scene on Sesame Street back in the day. The Cowboys suck. Go Birds! Things we’re told to fight about: Me. Laptop. Vaccines. Transgenders in sports. Pronouns. That’s the joke.

4K

72K

9K

4K

4M

0

56

Vimal Thilak🦉🐒

@AggieInCA

2 days ago

I told you people this was real!! PMAX 😍

Ravid Shwartz Ziv

@ziv_ravid

2 days ago

Jürgen Schmidhuber (@SchmidhuberAI ) on The Information Bottleneck podcast 😱 We took a question from the audience about JEPA… and he traced it straight back to 1992 Full episode tomorrow 🧐

7

102

9

74

21K

0

145

Vimal Thilak🦉🐒

@AggieInCA

3 days ago

@gabriberton Funny to see this poll. I grew up in India and used to hear people pronounce the name with "nee" whereas I am now used to hearing locals say name that ends with "eye". The latter feels natural to me now.

0

72

Who to follow

Federico Barbero

@fedzbar

Research scientist @googledeepmind I like Transformers and graphs. I also like chess and a few other things as well.

Postdoc@UC Berkeley CS; Research: ML, NLP, AI Safety

AggieInCA retweeted

Lucas Maes

@lucasmaes_

8 days ago

Would you like to join the research effort on JEPA and World Models easily? After a full year of hard work, we’re excited to finally release stable-worldmodel: an open-source, scalable platform built to accelerate JEPA & World Model research! 📄: https://t.co/gnxGvens5A

lucasmaes_'s tweet photo. Would you like to join the research effort on JEPA and World Models easily?

After a full year of hard work, we’re excited to finally release stable-worldmodel:

an open-source, scalable platform built to accelerate JEPA & World Model research!

📄: https://t.co/gnxGvens5A

38

2K

270

2K

111K

Vimal Thilak🦉🐒

@AggieInCA

10 days ago

@giffmana The second L was definitely one taken by the bot there ;)

0

3

1

323

Vimal Thilak🦉🐒

@AggieInCA

10 days ago

@liuzhuang1234 @TaiMingLu Nice work. You might find our empirical work on video interesting especially with the conclusions/observations you make :) https://t.co/3UWuPuhVO0

0

5

1

318

AggieInCA retweeted

Paul Jeha

@jeha_paul

16 days ago

Pre-training is increasingly data-constrained: compute outruns text, models repeat tokens many times, and how much repetition you can afford is an open question. In "Mix, Don't Tune" 🎶 (my @Apple MLR internship), we run ~1000 pre-training runs from 150M to 1.43B params with full HP grids at every scale, to figure out what actually drives performance when target-language data is scarce, and land on a concrete recipe for the data-constrained regime. (1/3) 📃: https://t.co/n8IB4sVeGB

jeha_paul's tweet photo. Pre-training is increasingly data-constrained: compute outruns text, models repeat tokens many times, and how much repetition you can afford is an open question. In "Mix, Don't Tune" 🎶 (my @Apple MLR internship), we run ~1000 pre-training runs from 150M to 1.43B params with full HP grids at every scale, to figure out what actually drives performance when target-language data is scarce, and land on a concrete recipe for the data-constrained regime. (1/3)
📃: https://t.co/n8IB4sVeGB

3

110

11

77

7K

AggieInCA retweeted

Anagh Malik @anagh_malik

16 days ago

📢📢📢 Velox 🚀: Learning Representations of 4D Geometry and Appearance In our #CVPR2026 paper, we introduce a method for learning a native 4D representation, useful for many downstream tasks, such as video-to-4D, 3D tracking, cloth simulation, and others! 🌐: https://t.co/MCkCMEftoJ 📝: https://t.co/iLKgrprXlO

7

170

51

95

20K

Vimal Thilak🦉🐒

@AggieInCA

15 days ago

intriguing. does this finding extend to other domains as well?

Tatsunori Hashimoto @tatsu_hashimoto

15 days ago

Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally 'low quality' data, and can sometimes even benefit.

tatsu_hashimoto's tweet photo. Some new results I found surprising that I’m tweeting for Chris (who isnt on here). With enough compute, the best data filter for LMs (on DCLM) might be no filter. Why? Large models can tolerate a surprising amount of nominally 'low quality' data, and can sometimes even benefit. https://t.co/VhshLOWBIx

32

1K

152

906

217K

0

290

Vimal Thilak🦉🐒

@AggieInCA

17 days ago

@_onionesque Same, Shubendu, Same. Padding estimates is my job, not my agent's ;)

1

0

62

Vimal Thilak🦉🐒

@AggieInCA

17 days ago

>No, I was padding. Honest re-estimate: Why?

1

0

210

Vimal Thilak🦉🐒

@AggieInCA

18 days ago

504. RIP @arxiv , we need to make you great again.

0

114

Vimal Thilak🦉🐒

@AggieInCA

24 days ago

@kalomaze Look up guillotine regularization ;)

0

1

0

68

Vimal Thilak🦉🐒

@AggieInCA

27 days ago

@LongLeRobot @GoogleDeepMind Congrats!

0

1

0

213

Vimal Thilak🦉🐒

@AggieInCA

27 days ago

We mourned this loss big time in Aggie (The real ones in NM) Land.

Martin Bauer

@martinmbauer

28 days ago

When Pluto was demoted from planet status

773

49K

7K

772

1M

0

225

Vimal Thilak🦉🐒

@AggieInCA

27 days ago

Wat is this

emily sihan zhang

@emilyzsh

28 days ago

the world (california) is not ready for the number of chindian children there will be in 2035

213

9K

268

1K

3M

0

364

Vimal Thilak🦉🐒

@AggieInCA

27 days ago

Yep, they have been working with JEPA for a little while and now have a practical recipe to do even better than existing I/V-JEPAs :). https://t.co/8UZ2Gk0Ryd https://t.co/3UWuPuhVO0

机器之心 JIQIZHIXIN

@jiqizhixin

28 days ago

Looks like Apple is very interested in JEPA! What if your AI could “read” an image’s caption to solve visual puzzles? Apple researchers present TC-JEPA: a new self-supervised method that uses image captions to guide masked patch predictions. By conditioning on text, the model reduces visual uncertainty and learns more semantically meaningful features. Result: TC-JEPA outperforms contrastive approaches across diverse tasks—especially fine-grained visual understanding and reasoning—while improving training stability and scaling.

jiqizhixin's tweet photo. Looks like Apple is very interested in JEPA!

What if your AI could “read” an image’s caption to solve visual puzzles?

Apple researchers present TC-JEPA: a new self-supervised method that uses image captions to guide masked patch predictions. By conditioning on text, the model reduces visual uncertainty and learns more semantically meaningful features.

Result: TC-JEPA outperforms contrastive approaches across diverse tasks—especially fine-grained visual understanding and reasoning—while improving training stability and scaling.

2

223

29

148

12K

0

7

0

2

411

Vimal Thilak🦉🐒

@AggieInCA

28 days ago

It's fun to revisit my old thread on this topic. Conducting research was actually interesting back then :):( https://t.co/foFrvS8rig

Vimal Thilak🦉🐒

@AggieInCA

almost 4 years ago

As reported in Grokking (arXiv:2201.02177) neural networks can exhibit sudden jumps in test accuracy late in training. We investigate this behavior and uncover an adaptive optimizer anomaly — The Slingshot Mechanism — that causes training instability but promotes generalization.

5

469

67

173

0

183

Vimal Thilak🦉🐒

@AggieInCA

29 days ago

Slingshot might have been put to bed finally? Watch out for Adam kiddos. Adam, our evergreen unstable genius :) https://t.co/B4ZHYDkPnW CC @deepcohen. I wonder if there are any works that have looked at EoS + finite precision numerics?

2

12

2

10

1K

Vimal Thilak🦉🐒

@AggieInCA

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users