Yipeng Zhang

about 1 month ago

@hafezghm and I are in Rio presenting this work. Come by our poster and chat about SSL, world models, and continual learning! 🗓️ Saturday, April 25th, 10:30am 📍Pavilion 3-#112

4 months ago

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. https://t.co/mN5e1huPO9 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

2

82

23

37

11K

0

11

3

4

868

yipengzz retweeted

Divyat Mahajan

@divyat09

about 1 month ago

Presenting Future Summary Prediction at #ICLR2026 🇧🇷 📌 Friday, 24th April, 10.30 am 📍Pavilion 3 (521) Come over to chat about novel pretraining objectives for LLMs!

divyat09's tweet photo. Presenting Future Summary Prediction at #ICLR2026 🇧🇷

📌 Friday, 24th April, 10.30 am
📍Pavilion 3 (521)

Come over to chat about novel pretraining objectives for LLMs! https://t.co/FFOQtZNaBI

1

41

10

14

7K

yipengzz retweeted

about 1 month ago

I’m in Rio presenting this work at the SPOT workshop 🇧🇷☀️! Feel free to reach out to chat about self distillation/privileged information and/or anything to do with post-training/long horizon agents.

2

41

8

10

5K

yipengzz retweeted

Lucas Maes

@lucasmaes_

2 months ago

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: https://t.co/cpTzgvbTS0

109

4K

559

4K

951K

yipengzz retweeted

Benjamin Thérien @ MLSys 2026

3 months ago

Come check out the talk for a deep breakdown of my recent work/blog :)

2

12

4

0

1K

yipengzz retweeted

@benjamintherien

3 months ago

Are frontier LLMs trained across datacenters? One thing is certain: if the pre-training optimizer’s critical batch size is too small, they are NOT! Excited to announce MuLoCo, a pre-training optimizer that can efficiently pre-train across datacenters while having large enough batch sizes to warrant doing so. 🧵1/N

benjamintherien's tweet photo. Are frontier LLMs trained across datacenters? One thing is certain: if the pre-training optimizer’s critical batch size is too small, they are NOT! Excited to announce MuLoCo, a pre-training optimizer that can efficiently pre-train across datacenters while having large enough batch sizes to warrant doing so. 🧵1/N

3

93

33

42

18K

yipengzz retweeted

Randall Balestriero

@randall_balestr

3 months ago

World Modeling research needs fast iteration, reproducibility, optimized baselines, open-source, and precise zero-shot stress testing. Here comes stable-worldmodel! Paper: https://t.co/aGmoYOId8U Code: https://t.co/YN3PS5xxV9 Come stress-test your model/idea! DINO-WM results ⬇️

21

260

49

167

44K

yipengzz retweeted

Sébastien Lachapelle @seblachap

4 months ago

https://t.co/SAZYxtVHcC

4

514

74

826

151K

yipengzz retweeted

4 months ago

I had a lot of fun meeting all the smart people at this workshop and presenting my work "On the Identifiability of Latent Action Policies" as an oral! A huge thanks to the organizers! Paper: https://t.co/FqTwVTxI4Z

1

26

4

5

2K

yipengzz retweeted

World Modeling Workshop @worldmodel_conf

4 months ago

Remember all the self-distillation papers that came out last week. Well, we also propose it 😅, but… But alongside something better 😎 π-Distill We show that with this method, you can distill closed-source frontier models even tho their traces are hidden 🔒. Both our methods can reach and even surpass the performance of the industry-standard SFT + RL with access to reasoning traces 🤯. 🔬And we spent ~100,000 hours GPU hours on a comprehensive analysis, not because the method is finicky, but because we wanted to understand why it works so well. 🧵 1/10

11

433

78

453

52K

yipengzz retweeted

4 months ago

What an awesome first day! Thank you all for joining and listening to our amazing speakers: @SchmidhuberAI, @sherryyangML, @cosmo_shirley, @Yoshua_Bengio, @ylecun, @mido_assran World Models have beautiful days ahead. This is just the beginning 🫡

worldmodel_conf's tweet photo. What an awesome first day! Thank you all for joining and listening to our amazing speakers: @SchmidhuberAI, @sherryyangML, @cosmo_shirley, @Yoshua_Bengio, @ylecun, @mido_assran

World Models have beautiful days ahead. This is just the beginning 🫡 https://t.co/ucbbqeYRwf

2

69

12

8

8K

4 months ago

I'm at @worldmodel_26 now through Friday. Lmk if you want to chat!

4 months ago

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. https://t.co/mN5e1huPO9 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

2

82

23

37

11K

0

11

2

1

2K

4 months ago

We hope AdaSSL inspires new ideas for using joint-embedding SSL to learn better representations and world models on naturally structured data. Kudos to my amazing collaborators: @hafezghm @yololulu_ @ShahabBakht @NeuralEnsemble @lcharlin Paper: https://t.co/mN5e1huPO9. Code coming soon. See you in Rio 🇧🇷 🧵/🧵

0

2

0

145

4 months ago

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. https://t.co/mN5e1huPO9 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

2

82

23

37

11K