Hugo Larcher @hugoch - Twitter Profile

Hugo Larcher @hugoch

4 months ago

@XciD_ @huggingface I now report to Claude … 🤦‍♂️

0

22

hugoch retweeted

Jeff Boudier 🤗

@jeffboudier

5 months ago

Thrilled to see Reachy Mini take center stage at @nvidia CES Keynote! 🤩 Paired with DGX Spark, it's the ultimate toolkit to build personal, private agents that are useful in the real world. 🤖 @NaderLikeLadder and @alecqfong created this incredible demo showing how Brev can tie together model APIs with private computing to make agents that listen to you and get your dog off the couch! 🐶 Link to the demo in thread to build it yourself - even if you don't have your Reachy Mini or DGX Spark yet

1

15

5

2

1K

Hugo Larcher @hugoch

6 months ago

@jedisct1 Yep, CPU and network heavy workloads like Xet file reconstruction at @huggingface proved to perform poorly as WebAssembly workloads compared to containers.

0

2

0

42

Hugo Larcher @hugoch

6 months ago

@HKydlicek NVIDIA buying SchedMD while their SuperPods currently run on k8s says a lot 😄

0

4

0

96

Who to follow

GUILLAUME

@GuiF_OVH

Product Manager Web Hosting, Databases, CDN, SSL et vidéo chez @OVHcloud_FR Mes tweets n'engagent que moi ! #WebHosting #web

Florentin DUBOIS

@FlorentinDUBOIS

Engineering manager and on call @clever_cloud ⛅, Former @OVHcloud, Biker 🏍 / Diver 🤿 / Climber 🧗 / Windsurfer 🏄 / squash player 🏸. he/him

Collignon Rémi

@miton1810

Concepteur de ☁️ Former Observabilities DevOps / NFV DevOps @CleverCloud

Hugo Larcher @hugoch

8 months ago

@jedisct1 Yep got it too during my last flight! Thought it was something wrong with my AirPods..

0

13

hugoch retweeted

dylan @dylan_ebert_

10 months ago

OpenAI just released GPT-OSS: An Open Source Language Model on Hugging Face Open source meaning: 💸 Free 🔒 Private 🔧 Customizable

15

213

39

76

22K

Hugo Larcher @hugoch

about 1 year ago

OMG, the U.S. just downloaded more than 5PB of DeepSeek-R1 on @huggingface in the last few days! Feeling late FOMO in Silicon Valley? 🤔🚀

hugoch's tweet photo. OMG, the U.S. just downloaded more than 5PB of DeepSeek-R1 on @huggingface in the last few days!
Feeling late FOMO in Silicon Valley? 🤔🚀 https://t.co/ouZbNZWwZf

2

21

3

2

2K

Hugo Larcher @hugoch

about 1 year ago

@AdrienGallouet Nice! Is that with stock llama.cpp?

0

8

Hugo Larcher @hugoch

about 1 year ago

🧵(2/2) With inference-benchmarker you can: 🧪 Simulate real workloads (chat, code-gen...)  📊 Measure throughput, time-to-first-token, inter-token latency  ⚙️ Compare performance across backends & infra  👉 Check it out: https://t.co/TJSY4BPmYu

0

8

1

4

331

Hugo Larcher @hugoch

about 1 year ago

🧠 LLM inference isn’t just about latency — it’s about consistency under load.  Different workloads, configs, and hardware = very different real-world performances. At Hugging Face 🤗 we built inference-benchmarker — a simple tool to stress-test LLM inference servers. 🧵 (1/2)

2

38

13

15

3K

Hugo Larcher @hugoch

over 1 year ago

@eliebakouch @huggingface @eliebakouch the guy knows how to cook!

0

5

0

488

Hugo Larcher @hugoch

over 1 year ago

At @huggingface we rely on GPU-fryer 🍳 to load-test our 768 H100 GPU cluster. It runs matrix multiplications and monitors TFLOPs outliers to catch any software or hardware throttling — often a sign of cooling issues that need a hardware fix ❄️🔧. 🧵 1/2

hugoch's tweet photo. At @huggingface we rely on GPU-fryer 🍳 to load-test our 768 H100 GPU cluster. It runs matrix multiplications and monitors TFLOPs outliers to catch any software or hardware throttling — often a sign of cooling issues that need a hardware fix ❄️🔧.
🧵 1/2 https://t.co/N9WmW4UcLz

5

253

29

91

24K

Hugo Larcher @hugoch

over 1 year ago

@huggingface GPU-fryer helps us detect silent throttling failures: one GPU slows down and every other unit ends up waiting, creating a bottleneck 🚦. Check it out: https://t.co/hRDyZ9LaxB

0

44

4

11

2K

Hugo Larcher @hugoch

over 1 year ago

This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU). We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned 🤗 ! https://t.co/eGpEvqVM8L

0

8

1

2

215

Hugo Larcher @hugoch

over 1 year ago

We are introducing multi-backend support in @huggingface Text Generation Inference! With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware.

hugoch's tweet photo. We are introducing multi-backend support in @huggingface Text Generation Inference!
With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware. https://t.co/lKs8dezOBy

2

58

8

18

3K

hugoch retweeted

clem 🤗

@ClementDelangue

over 1 year ago

Just 10 days after o1's public debut, we’re thrilled to unveil the open-source version of the groundbreaking technique behind its success: scaling test-time compute 🧠💡 By giving models more "time to think," LLaMA 1B outperforms LLaMA 8B in math—beating a model 8x its size. The full recipe is open-source🤯 This is the power of open science and open-source AI! 🌍✨

ClementDelangue's tweet photo. Just 10 days after o1's public debut, we’re thrilled to unveil the open-source version of the groundbreaking technique behind its success: scaling test-time compute 🧠💡

By giving models more "time to think," LLaMA 1B outperforms LLaMA 8B in math—beating a model 8x its size. The full recipe is open-source🤯

This is the power of open science and open-source AI! 🌍✨

114

4K

607

2K

497K

hugoch retweeted

Ann Huang @AnnInTweetD

over 1 year ago

We're turning @huggingface Hub's files into content-defined chunks to speed up your workflows!⚡️ This means: - 🧠We store your file as deduplicated chunks - ⏩ You only upload changed chunks when iterating! - 🚀 Pulling changes? Only download changed chunks!

3

53

16

6

17K

Hugo Larcher @hugoch

over 1 year ago

@andi_marafioti @TechExplorerCH @gui_penedo « GPU has fallen off the bus » 🚌 …!

0

47

Hugo Larcher @hugoch

over 1 year ago

@anushkmittal You can checkout implementation from @FerdinandMom !

Ferdinand Mom

@FerdinandMom

over 1 year ago

https://t.co/wLUm68ZY1k

0

14

2

12

2K

0

2

0

89

Hugo Larcher @hugoch

over 1 year ago

An easy way to understand Pipeline Parallelism with a self contained implementation. Check it out!

Ferdinand Mom

@FerdinandMom

over 1 year ago

Interested in 4D parallelism but feeling overwhelmed by Megatron-LM codebase? We are currently cooking something with @Haojun_Zhao14 and @xariusrke 😉 In the meantime, here is a self-contained script that implements Pipeline Parallelism (AFAB + 1F1B) in 200 LOC 🧵👇

FerdinandMom's tweet photo. Interested in 4D parallelism but feeling overwhelmed by Megatron-LM codebase? We are currently cooking something with @Haojun_Zhao14 and @xariusrke 😉

In the meantime, here is a self-contained script that implements Pipeline Parallelism (AFAB + 1F1B) in 200 LOC 🧵👇 https://t.co/SCbKRknIOF

12

230

44

147

27K

1

11

1

582

Hugo Larcher

@hugoch

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users