Daniel Hesslow

Machine Learning Optimization @HuggingFace 🤗

8 months ago

AI agents aren't magic🪄🙅‍♂️. They're language models trained to predict very specific text. When ChatGPT says 🔍"searching..." it's generating structured text that triggers actual tools. Here's what's really happening 👇

1

6

3

0

2K

DanielHesslow retweeted

Google AI Developers

@googleaidevs

10 months ago

Some interesting use cases from @AdaptiveML powered by Gemma 🧵⬇️

5

185

11

90

24K

DanielHesslow retweeted

Omar Sanseviero

@osanseviero

11 months ago

SK Telecom + @AdaptiveML trained Gemma 3 4B with PPO obtaining impressive results, specially for a model of such size Learn more about how they did this https://t.co/A1Iqriut09

osanseviero's tweet photo. SK Telecom + @AdaptiveML trained Gemma 3 4B with PPO obtaining impressive results, specially for a model of such size

Learn more about how they did this https://t.co/A1Iqriut09 https://t.co/wH9mUaceOq

1

48

12

19

3K

Who to follow

Ilyas

@IlysMoutawwakil

Joseph Suarez 🐡

@jsuarez

I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.

Jeff Boudier 🤗

@jeffboudier

Product + Growth @HuggingFace 🤗, the #1 open platform for AI builders. Co-founder Stupeflix (acquired by @GoPro).

DanielHesslow retweeted

11 months ago

Kimi K2 is a vision into the future of 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝘁𝗼𝗼𝗹 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 and 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴. Leveraging 3,000+ MCP tools, the team generated 20,000+ synthetic tools and used them to train their 1T MoE model. 📜Paper and pipeline ⬇️

AdaptiveML's tweet photo. Kimi K2 is a vision into the future of 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝘁𝗼𝗼𝗹 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 and 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝗺𝗼𝗱𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴.

Leveraging 3,000+ MCP tools, the team generated 20,000+ synthetic tools and used them to train their 1T MoE model.

📜Paper and pipeline ⬇️ https://t.co/6HE5pclncd

1

10

3

379

11 months ago

Shoutout from google is pretty great, nice work @alexchapeaux & and everyone over at @SKtelecom

Google AI Developers

@googleaidevs

11 months ago

Build enterprise AI without the latency and cost of massive models. Learn how @AdaptiveML used Gemma 3 to create a multilingual customer service moderation LLM for @SKtelecom to support their 23M+ subscribers who speak a mix of English and Korean. https://t.co/scwPSdSCXB

0

43

6

7

6K

0

135

DanielHesslow retweeted

Google AI Developers

@googleaidevs

11 months ago

Build enterprise AI without the latency and cost of massive models. Learn how @AdaptiveML used Gemma 3 to create a multilingual customer service moderation LLM for @SKtelecom to support their 23M+ subscribers who speak a mix of English and Korean. https://t.co/scwPSdSCXB

0

43

6

7

6K

DanielHesslow retweeted

11 months ago

Using Adaptive Engine, @SKtelecom tuned open models as small as Gemma 3 4B to exceed frontier performance (GPT-4.1, 3.7 Sonnet, and o4-mini) at multilingual content moderation. Our research 📃 and full results 👇

AdaptiveML's tweet photo. Using Adaptive Engine, @SKtelecom tuned open models as small as Gemma 3 4B to exceed frontier performance (GPT-4.1, 3.7 Sonnet, and o4-mini) at multilingual content moderation.

Our research 📃 and full results 👇

2

28

15

8

4K

DanielHesslow retweeted

ES-FoMo@ICML2025 @ESFoMo

11 months ago

Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/

ESFoMo's tweet photo. Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow.

Let's meet some of our great speakers! 1/ https://t.co/9zMi5DrHvk

3

79

21

25

49K

about 1 year ago

We're back for the third year running!

ES-FoMo@ICML2025 @ESFoMo

about 1 year ago

ES-FoMo is back for round three at #ICML2025! Join us in Vancouver on Saturday July 19 for a day dedicated to Efficient Systems for Foundation Models: from 💬reasoning models to🖼️scalable multimodality, 🧱efficient architectures, and more! Submissions due May 26! More below 👇

8

18

9

0

11K

0

1

0

133

over 1 year ago

Turns out we have a twitter at Adaptive ML, who knew? And it seems to be posting interesting stuff as well (completely unbiased observation)

over 1 year ago

Pretrained LLMs are aliens of extraordinary intelligence, yet little understanding. 👽 How do post-training techniques like 𝐒𝐅𝐓, 𝐑𝐄𝐈𝐍𝐅𝐎𝐑𝐂𝐄, and 𝐏𝐏𝐎 work in-tandem to turn these aliens into helpful AI assistants? 🧵 👇

AdaptiveML's tweet photo. Pretrained LLMs are aliens of extraordinary intelligence, yet little understanding. 👽

How do post-training techniques like 𝐒𝐅𝐓, 𝐑𝐄𝐈𝐍𝐅𝐎𝐑𝐂𝐄, and 𝐏𝐏𝐎 work in-tandem to turn these aliens into helpful AI assistants? 🧵 👇 https://t.co/w9gTSyaTpR

3

8

1

0

412

0

4

0

198

over 1 year ago

@giffmana tbh it's not amazing, but it is less terrible than most other options. cargo is great & static typing let's you refacto stuff much faster. otoh the borrow checker really can be a pain for quick prototyping of stuff, and proc macros are a terrible hack that needs to be used a lot

2

7

0

1

2K

almost 2 years ago

@tri_dao Specdec improving throughput is a suuuper nice finding!

0

2

0

269

almost 2 years ago

@Muennighoff @zach_nussbaum https://t.co/jPjfWSm4cx

0

1

0

48

almost 2 years ago

@Muennighoff @zach_nussbaum I was a bit unclear, information can leak in the forward pass from future tokens into the previous ones. Here's an illustration, you can completely predict the next token, through leakage in the routing. Obviously very toy example, a single expert that choses a single token

DanielHesslow's tweet photo. @Muennighoff @zach_nussbaum I was a bit unclear, information can leak in the forward pass from future tokens into the previous ones. Here's an illustration, you can completely predict the next token, through leakage in the routing.

Obviously very toy example, a single expert that choses a single token https://t.co/UvS4esf8Q3

1

0

65

almost 2 years ago

@Muennighoff @zach_nussbaum (And assigning it to the last "hello" is just a matter of counting the number of previous hellos, and have the routing be a function of that). I guess maybe you can't actually leak that much information through this. Might just be too costly to be worth exploiting it

1

0

51

almost 2 years ago

@Muennighoff @zach_nussbaum Like you should be able to have one expert that says there's no more "hello" and assign it to the last "hello" in the sequence or smth. But y'know sometimes the optimization process is not strong enough to be able to exploit every loop hole. Cool finding tho!

1

0

26

almost 2 years ago

@Muennighoff Interesting about EC vs TC, how do you do expert choice with a causal model?

1

0

55

almost 2 years ago

This is a very nice direction from @PyTorch! Even when we need the highest possible performance, we can still use Torch as a first step and export the IR to external codebases with production guarantees around memory etc!

0

3

0

167

almost 2 years ago

Want to know what actually goes on inside a PyTorch function? Found a new undocumented feature that shows it 👀

1

15

2

690