Timothée Lesort @TLesort - Twitter Profile

TLesort retweeted

Benjamin Thérien @ MLSys 2026

20 days ago

Excited to announce our @COLM_conf workshop "Context Beyond the Window"! We have a stacked lineup of speakers focusing on one of the most important topics today: context management. Venue is non-archival; We welcome all relevant 4 or 8-page submissions! https://t.co/LWrTtE3quM

0

13

3

1

1K

TLesort retweeted

Arthur Douillard

@Ar_Douillard

about 2 months ago

The DiLoCo team at Google DeepMind and Google Research is proud to release Decoupled DiLoCo, the next frontier for resilient AI pre-training. Decoupled DiLoCo enables training with datacenters across the world, using heterogeneous hardware, and never halting the system despite hardware failures.

34

609

85

299

3M

Timothée Lesort @TLesort

2 months ago

@itsNVA7 Replay🤞

0

1

0

70

TLesort retweeted

elie

@eliebakouch

3 months ago

(continual) pre-training is not dead! some thoughts about cost per task (on cursorbench) being 2x lower, imo it can be due to: - new base model: seems straightforward but it's not imo, you need to optimize the inference/training stack (both for rl and consumer inference), GLM-5 > V3.2 doesn't mean glm-5-base > V3, they may not be equally malleable to post/mid training - more optimized kernel and inference stack/serving (most likely) - rl/mid-training with objective/data to make smaller CoT (most likely) - mid-training with more efficient arch: i would love for this to be true, and i can see how it's necessary if they use the previous base model generation and need efficient memory for long context, but since they also released some tricks with self-summary, i'd say unlikely? (they can and imo should be combined together for very long tasks)

eliebakouch's tweet photo. (continual) pre-training is not dead!

some thoughts about cost per task (on cursorbench) being 2x lower, imo it can be due to:

- new base model: seems straightforward but it's not imo, you need to optimize the inference/training stack (both for rl and consumer inference), GLM-5 > V3.2 doesn't mean glm-5-base > V3, they may not be equally malleable to post/mid training
- more optimized kernel and inference stack/serving (most likely)
- rl/mid-training with objective/data to make smaller CoT (most likely)
- mid-training with more efficient arch: i would love for this to be true, and i can see how it's necessary if they use the previous base model generation and need efficient memory for long context, but since they also released some tricks with self-summary, i'd say unlikely? (they can and imo should be combined together for very long tasks)

3

135

5

48

28K

Who to follow

Shiva

@ShivaSujit

Deep RL at @ArayaGlobal | Prev @MSFTResearch | MSc @Mila_Quebec in RL | BSc @ReachNITT

Arnab

@ArnabMondal96

ML Researcher @Apple  | PhD @mcgillu + @Mila_Quebec | Undergrad @IITKgp | Formerly: @MSFTResearch @ServiceNowRSRCH @samsungresearch

Ryan D'Orazio

@RyanDOrazio

PhD Student at Mila Quebec AI Institute, and Université de Montréal.

TLesort retweeted

Benjamin Thérien @ MLSys 2026

@benjamintherien

4 months ago

This week, we released a paper from Meta @AIatmeta “MuLoCo: Muon is a practical inner optimizer for DiLoCo”, showing that K=1 MuLoCo has a Pareto-optimal performance-training-time tradeoff Let’s drill deeper into single-worker MuLoCo’s efficiency 🧵1/5 https://t.co/HXyxYkexus

2

49

11

19

6K

TLesort retweeted

Eugene Belilovsky

@ebelilov

4 months ago

I have open positions including Postdocs, PhD, master's students, and PhD interns. For more information https://t.co/8KUEDEfvz0

0

4

5

3

2K

Timothée Lesort @TLesort

4 months ago

@deliprao Imho any work is worth publishing if you find it interesting and you have some kind of a contribution to share with the community. It should not be about flag planting but feeling confident to share your work. I hope you will find your way through academia :)

0

92

Timothée Lesort @TLesort

4 months ago

@natolambert Continual learning is needed if you deal with data distribution shifts. If you train your model with iid data all along and reach the level of IA you need, CL is useless. But tbh nowadays the multiple steps to pretrain/tune/post-train already have some taste of CL 🙃

0

153

Timothée Lesort @TLesort

5 months ago

After 5 years away from Paris, in Montreal and Berlin, I am excited to announce that I am back! 🎉 I am starting as a researcher in the freshly created IMEC AILABS at @imec_int 🚀 We will work on pushing the frontier of AI research and building creative innovations 🎈

0

3

0

139

Timothée Lesort @TLesort

6 months ago

@xeophon If you assume that the model can already realize all the operations you need then why not (ignoring the fact that the resulting algorithm might be slow) but if you need to change/increase the functional space of your model you have to update the parameters.

0

50

TLesort retweeted

Arthur Douillard

@Ar_Douillard

7 months ago

Come see the Scaling Laws for DiLoCo poster at NeurIPS with @GabrielTeston and @NovaFallen8 !

3

54

8

9

4K

TLesort retweeted

merve

@mervenoyann

7 months ago

I mean it's a multimodal model (not inherently text-only), it's Instruct (not thinking), and it's compared with thinking models like K2 on non-thinking mode this is an uneducated take imo

17

388

17

20

42K

TLesort retweeted

Yoshua Bengio

@Yoshua_Bengio

7 months ago

Intensifying geopolitical competition leaves AI bridge powers in a difficult situation where they’ll soon likely face insurmountable barriers to independent frontier AI development. To stay relevant and thrive economically, they need to work together and strategically choose their AI development approaches.

9

104

26

73

16K

Timothée Lesort @TLesort

7 months ago

@soumithchintala Good luck with your next step. Thanks for creating gold with pytorch. Truly an amazing tool for building algorithms!

0

199

TLesort retweeted

Massimo Caccia

@MassCaccia

10 months ago

🔥 We stress-tested today’s best AI code generators in 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑐𝑦 ℎ𝑒𝑙𝑙. Introducing 𝐆𝐢𝐭𝐂𝐡𝐚𝐦𝐞𝐥𝐞𝐨𝐧 𝟐.𝟎: 328 challenges for version-controlled code generation. The verdict? Even top models only hit ~50% success.

MassCaccia's tweet photo. 🔥 We stress-tested today’s best AI code generators in 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑐𝑦 ℎ𝑒𝑙𝑙.

Introducing 𝐆𝐢𝐭𝐂𝐡𝐚𝐦𝐞𝐥𝐞𝐨𝐧 𝟐.𝟎: 328 challenges for version-controlled code generation.

The verdict? Even top models only hit ~50% success. https://t.co/wEtcWBbWtJ

3

44

25

12

5K

Timothée Lesort @TLesort

11 months ago

@prlz77 And it seems they believe that just one thesis book is a piece of research a bit too thin.... 😅

1

2

0

202

Timothée Lesort @TLesort

12 months ago

@abursuc @TimDarcet @julienmairal @p_bojanowski @dlarlus @ylecun @CordeliaSchmid @chriswolfvision @xavirema Congrats @TimDarcet 👏

0

2

0

138

Timothée Lesort @TLesort

12 months ago

@giffmana Propre.

0

130

TLesort retweeted

Lucas Caccia @LucasPCaccia

12 months ago

RAG and in-context learning are the go-to approaches for integrating new knowledge into LLMs, making inference very inefficient We propose instead 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗠𝗼𝗱𝘂𝗹𝗲𝘀 : lightweight LoRA modules trained offline that can match RAG performance without the drawbacks

1

45

13

19

9K

TLesort retweeted

Benjamin Thérien @ MLSys 2026

@benjamintherien

about 1 year ago

Tired of tuning hyperparameters? Introducing PyLO! We’re bringing hyperparameter-free learned optimizers to PyTorch with drop in torch.optim support and faster step times thanks to our custom cuda kernels. Check out our code here: https://t.co/5CnKdTHef0

2

30

7

2K

Timothée Lesort

@TLesort

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users