François Le Lay @lelayf - Twitter Profile

François Le Lay @lelayf

7 months ago

@TheFuturesDesk Happy anniversary, you are doing a good thing here!

0

6

François Le Lay @lelayf

over 1 year ago

Can't wait to jailbreak Tesla Optimus and make it walk like it does *not* have plantar fasciitis.

0

1

0

50

lelayf retweeted

Sasha Rush

@srush_nlp

over 2 years ago

Vector databases have raised billions offering hosted management of text embeddings. What are you revealing in these vectors ? 🔒 We demonstrate a practical method for full recovery of 90% of sentence length embeddings (https://t.co/JNBN7mm3Wu).

13

613

91

336

114K

lelayf retweeted

Jure Leskovec

@jure

over 2 years ago

🌟 Excited to announce the Stanford Graph Learning Workshop 2023 on Oct 24 2023!🤝 Bringing together academia & industry leaders to delve into advances in #MachineLearning & #AI in Relational domains, Foundation models, and Multimodal AI. 📢 Free registration. 📢 Calling for talks & poster/demo submissions! Showcase your innovative work in methodological advancements, diverse domain applications, & ML frameworks. https://t.co/JtU9DDW0Fd

jure's tweet photo. 🌟 Excited to announce the Stanford Graph Learning Workshop 2023 on Oct 24 2023!🤝 Bringing together academia & industry leaders to delve into advances in #MachineLearning & #AI in Relational domains, Foundation models, and Multimodal AI. 📢 Free registration. 📢 Calling for talks & poster/demo submissions! Showcase your innovative work in methodological advancements, diverse domain applications, & ML frameworks.
https://t.co/JtU9DDW0Fd

7

690

141

240

108K

Who to follow

Michelle Habonneau

@michelleyhbn

❤️ PM at @Huggingface 🤗

Software & ML Engineer @huggingface 🦀

lelayf retweeted

Stanislav Kozlovski

@kozlovski

almost 3 years ago

The way Datadog calculates percentiles at scale is very innovative 🔥 Usually, calculating the percentiles of large datasets is very expensive. To know the 99th percentile of a stream of values, you need to: - keep all the values - sort them - return the value whose rank matches the percentile (e.g 99th item) Datadog cannot afford to do this with the many millions of data points that come in every second - the space and CPU requirements are not practical for a company with thousands of customers. 🐾 Naturally, they opted for sketch algorithms - those should provide them with a good-enough probabilistic result while being vastly more efficient to compute. Unfortunately - they couldn’t get satisfactory results. The algorithms would produce results that were too inaccurate. ❌ Why? Many percentile sketches had guarantees in terms of *rank error*. A rank-error guarantee of 2% means that the p95 value returned by the sketch is somewhere between the p93-p97 value. But system latencies exhibit very fat tails - the difference between the p97 and p99 values can be 2-10x! So what did the dogs do? 🐶 They invented a new sketch algorithm - DDSketch. Instead of rank error guarantees, they designed it for *relative error* guarantees. If the p99 is 60s, a 2% error means the sketch would return 58.8-61.2s. The algorithm is surprisingly pretty simple: • They create buckets covering ranges of the desired error rate. (+- 2% in this case) 🪣 • Each bucket keeps a counter of the amount of data points within that range. 💯 • When processing an item (latency metric data point), increment the counter of the appropriate bucket. ➕ • To count the desired percentile, you sum up the bucket’s values until you get to the desired percentile. Whatever bucket that percentile is in - that’s your value. 🏆 In this example, the 50th percentile is 1033ms. (4th value out of our total of 8) Going by count, the 4th value is in the second bucket (b-1) and the algorithm would produce a result of 1021-1061ms. To cover the range from 1 millisecond to 1 minute, you only need 275 buckets. With 64-bit counters, that's just ~2kB of memory, regardless of the amount of input data. This is why we call sketch algorithms sublinear in space growth - memory requirements do NOT grow linearly with input. The exponential nature of the bucket distribution makes it cheap to cover an even wider range: 1 nanosecond to 1 day takes just 3x more buckets: • 802 buckets at ~6kB. As you can probably tell, this is pretty easy to parallelize. You can divide this bucket-building exercise into many parallel lightweight substreams, and then merge the results freely. 🕊 The merge operation is a simple sum of the buckets & their counters, which ensures that the accuracy is kept in the same range. It is a very scalable and performant sketch algorithm. Kudos to Datadog for inventing it. Good boy! 🫳🐕‍🦺

kozlovski's tweet photo. The way Datadog calculates percentiles at scale is very innovative 🔥

Usually, calculating the percentiles of large datasets is very expensive.

To know the 99th percentile of a stream of values, you need to:
- keep all the values
- sort them
- return the value whose rank matches the percentile (e.g 99th item)

Datadog cannot afford to do this with the many millions of data points that come in every second - the space and CPU requirements are not practical for a company with thousands of customers. 🐾

Naturally, they opted for sketch algorithms - those should provide them with a good-enough probabilistic result while being vastly more efficient to compute.

Unfortunately - they couldn’t get satisfactory results.
The algorithms would produce results that were too inaccurate. ❌

Why?

Many percentile sketches had guarantees in terms of *rank error*.

A rank-error guarantee of 2% means that the p95 value returned by the sketch is somewhere between the p93-p97 value.

But system latencies exhibit very fat tails - the difference between the p97 and p99 values can be 2-10x!

So what did the dogs do? 🐶

They invented a new sketch algorithm - DDSketch.

Instead of rank error guarantees, they designed it for *relative error* guarantees.

If the p99 is 60s, a 2% error means the sketch would return 58.8-61.2s.

The algorithm is surprisingly pretty simple:

• They create buckets covering ranges of the desired error rate. (+- 2% in this case) 🪣

• Each bucket keeps a counter of the amount of data points within that range. 💯

• When processing an item (latency metric data point), increment the counter of the appropriate bucket. ➕

• To count the desired percentile, you sum up the bucket’s values until you get to the desired percentile. Whatever bucket that percentile is in - that’s your value. 🏆

In this example, the 50th percentile is 1033ms. (4th value out of our total of 8)

Going by count, the 4th value is in the second bucket (b-1) and the algorithm would produce a result of 1021-1061ms.

To cover the range from 1 millisecond to 1 minute, you only need 275 buckets.
With 64-bit counters, that's just ~2kB of memory, regardless of the amount of input data.

This is why we call sketch algorithms sublinear in space growth - memory requirements do NOT grow linearly with input.

The exponential nature of the bucket distribution makes it cheap to cover an even wider range: 1 nanosecond to 1 day takes just 3x more buckets:
• 802 buckets at ~6kB.

As you can probably tell, this is pretty easy to parallelize.

You can divide this bucket-building exercise into many parallel lightweight substreams, and then merge the results freely. 🕊

The merge operation is a simple sum of the buckets & their counters, which ensures that the accuracy is kept in the same range.

It is a very scalable and performant sketch algorithm.

Kudos to Datadog for inventing it.
Good boy! 🫳🐕‍🦺

25

2K

230

1K

266K

François Le Lay @lelayf

about 3 years ago

@swyx Interesting! I don't think a group of over-achievers gives you an over-achieving team, as with all things in life, balance is key.

0

413

François Le Lay @lelayf

about 3 years ago

@I_Am_The_ICT 🙋‍♂️ & thank you for the teaching!

0

9

lelayf retweeted

Alf

@MacroAlf

about 3 years ago

Given the Nvidia valuations, I think it's worth sharing this anectode from the 2000s. At its peak, the Sun Microsystems stock hit valuation of 10x sales. When stocks took a massive beating later on, this is what its CEO Scott McNealy had to say to investors:

MacroAlf's tweet photo. Given the Nvidia valuations, I think it's worth sharing this anectode from the 2000s.

At its peak, the Sun Microsystems stock hit valuation of 10x sales.

When stocks took a massive beating later on, this is what its CEO Scott McNealy had to say to investors: https://t.co/qOETfgtNa8

258

6K

940

1K

2M

François Le Lay @lelayf

about 3 years ago

Debugging the Internet with AI agents – with Itamar Friedman of Codium AI and AutoGPT, by @swyx https://t.co/MiznHc1j9Z

0

3

0

112

lelayf retweeted

Jure Leskovec

@jure

about 3 years ago

📣Ready to dive deep into the world of #MachineLearning with Graphs? 🌐 Our new online class explores how diseases & information spread, traffic & weather predictions, & so much more! Join us, & make sense of the world's complex data🚀. https://t.co/04L9TjSbCc We kick off on June 5th! #DataScience #OnlineCourse @StanfordEng

5

321

63

189

52K

lelayf retweeted

BigCode @BigCodeProject

about 3 years ago

Introducing the BigCode Evaluation Harness for Code LLMs: https://t.co/InCoQoc28O Inspired by the lm-evaluation-harness from @AiEleuther, it ensures ease-of-use, reproducibility and efficiency. Let’s explore its key features 🧵:

BigCodeProject's tweet photo. Introducing the BigCode Evaluation Harness for Code LLMs:

https://t.co/InCoQoc28O

Inspired by the lm-evaluation-harness from @AiEleuther, it ensures ease-of-use, reproducibility and efficiency. Let’s explore its key features 🧵: https://t.co/ltAoiDiMOg

2

166

40

66

32K

François Le Lay @lelayf

about 3 years ago

@julien_c yup, just like $SPX, consolidation before the next leg up!

0

274

François Le Lay @lelayf

about 3 years ago

@mounialalmas At first glance thought it was the cover of a New Order album, not!

0

1

0

26

lelayf retweeted

Gustav Söderström

@GustavS

over 3 years ago

Pretty proud of the @Spotify team for this one – a generative, expressive and realistic AI DJ that delivers a personalized lineup of music and commentary to each user, for all those times you don’t know exactly what you want to hear. Hope you like it! 🎧 https://t.co/4PwXKUW0Cl

38

861

94

101

246K

François Le Lay @lelayf

over 3 years ago

@katieelink @lehmer16 @MIT_CSAIL Super interesting, thank you Katie!

0

44

François Le Lay @lelayf

over 3 years ago

@GustavS @Spotify Show and Tell! ;) Congrats on the launch!

0

168

François Le Lay @lelayf

over 3 years ago

@free 6 mois que j'ai demandé la résiliation de ma ligne, uploadé les docs 3 fois, et vous n'en avez toujours pas pris acte? L'efficacité en bande organisée.

1

0

23

François Le Lay @lelayf

over 3 years ago

@bernhardsson I have found the Modal error messages really helpful with the occasional suggestion to do something differently!

0

2

0

110

François Le Lay @lelayf

over 3 years ago

@bernhardsson @modal_labs Turns out I am now the head of solution engineering at @KensuIO and I think there is an awesome synergy to add data observability capabilities to Modal functions by integrating with our agents and our Community Edition.

0

3

0

François Le Lay @lelayf

over 3 years ago

@PrefectIO It's easy: Flowy McFlowface

0

François Le Lay

@lelayf

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users