Sylvain Lesage @severo_dev - Twitter Profile

Pinned Tweet

about 1 year ago

As 📷 xet-team infrastructure begins backing hundreds of repositories on the Hugging Face Hub, we’re getting to put on our researcher hats and peer into the bytes. 👀 🤓 IMO, one of the most interesting ideas Xet storage introduces is a globally shared store of data.

severo_dev's tweet photo. As 📷 xet-team infrastructure begins backing hundreds of repositories on the Hugging Face Hub, we’re getting to put on our researcher hats and peer into the bytes. 👀 🤓

IMO, one of the most interesting ideas Xet storage introduces is a globally shared store of data. https://t.co/uex1EQSApc

3

2

0

1K

Sylvain Lesage @severo_dev

about 1 year ago

Give me some time 🤗🤗

2

1

505

Sylvain Lesage @severo_dev

about 1 year ago

Received a lot of information, decided to continue building practical AI programs Dev 80% supply has been locked https://t.co/y7jveEeNSS

Sylvain Lesage @severo_dev

about 1 year ago

Yo, I just dropped an experimental memecoin on Pump. Should I keep building an AI app for it?🤗🤗 https://t.co/J9hYGj5jV2

6

12

2

1

5K

3

4

2

0

1K

Sylvain Lesage @severo_dev

about 1 year ago

Yo, I just dropped an experimental memecoin on Pump. Should I keep building an AI app for it?🤗🤗 https://t.co/J9hYGj5jV2

6

12

2

1

5K

Sylvain Lesage @severo_dev

about 1 year ago

I 🙏 at the altar of stamina. "(Stamina is) the ability to chip away at goals despite a lack of visible progress. To hold focus and presence in a world incentivized to distract you. To stay patient. To be on time. To push through difficult material. To follow instructions or proceed without them."

1

0

935

Sylvain Lesage @severo_dev

about 1 year ago

See that purple banner on the Llama 4 models? It's Xet storage, and this is actually huge for anyone building with AI models. Real numbers: ~25% deduplication on Llama 4 models, hitting ~40% for finetunes

severo_dev's tweet photo. See that purple banner on the Llama 4 models? It's Xet storage, and this is actually huge for anyone building with AI models. Real numbers: ~25% deduplication on Llama 4 models, hitting ~40% for finetunes https://t.co/ZSPIztVp9n

0

535

Sylvain Lesage @severo_dev

about 1 year ago

The Lovelace 2.0 Test of Artificial Creativity and Intelligence “tell a story in which a boy falls in love with a girl, aliens abduct the boy, and the girl saves the world with the help of a talking cat.” Change that 🐱 to a 🐶 and I'd read that story any day. https://t.co/dDnBE7cPec

0

300

Sylvain Lesage @severo_dev

about 1 year ago

Online demo of hyparquet: a parser for apache parquet files. Uses hightable for high performance windowed table viewing.https://t.co/CDyIgVCIiT

0

232

severo_dev retweeted

Ihor Stepanov

@ihor_step

about 1 year ago

I just finished my small experiments comparing different encoder models on retrieval tasks. The goal was to check whether MLM is better than RTD for these tasks. I compared Electra's small models, both generator and discriminator, that have the same size. Additionally, it was tested DeBERTa v1, which was pre-trained with MLM and DeBERTa v3, which was pre-trained with RTD on a two times larger dataset. As a baseline ModernBERT was evaluated as well. Models were fine-tuned on 500k examples from the MS-MARCO dataset (https://t.co/WTCXdEZFP1). For benchmarking, the NanoBEIR evaluator was used. You can see the average ndcg@10 plotted below. It's clearer that MLM-trained models produce better discriminative features, however, more detailed experiments are needed for more accurate conclusions. @antoine_chaffin @tomaarsen

ihor_step's tweet photo. I just finished my small experiments comparing different encoder models on retrieval tasks.

The goal was to check whether MLM is better than RTD for these tasks.

I compared Electra's small models, both generator and discriminator, that have the same size. Additionally, it was tested DeBERTa v1, which was pre-trained with MLM and DeBERTa v3, which was pre-trained with RTD on a two times larger dataset. As a baseline ModernBERT was evaluated as well.

Models were fine-tuned on 500k examples from the MS-MARCO dataset (https://t.co/WTCXdEZFP1).

For benchmarking, the NanoBEIR evaluator was used. You can see the average ndcg@10 plotted below.

It's clearer that MLM-trained models produce better discriminative features, however, more detailed experiments are needed for more accurate conclusions.

@antoine_chaffin @tomaarsen

5

27

4

16

15K

Sylvain Lesage @severo_dev

about 1 year ago

@ClementDelangue Absolutely, the transformer architecture has become a cornerstone for model definition

0

1

0

103

severo_dev retweeted

clem 🤗

@ClementDelangue

about 1 year ago

So cool to see transformers becoming the source of truth for model definition & collaborating with wonderful partners like vLLM to have these models run everywhere the fastest! As a model builder, it means that you integrate with Hugging Face & instantly get hundreds of integrations out of the box. Time to accelerate AI, one integration at a time!

ClementDelangue's tweet photo. So cool to see transformers becoming the source of truth for model definition & collaborating with wonderful partners like vLLM to have these models run everywhere the fastest!

As a model builder, it means that you integrate with Hugging Face & instantly get hundreds of integrations out of the box.

Time to accelerate AI, one integration at a time!

13

154

17

26

21K

severo_dev retweeted

jade

@jadechoghari

about 1 year ago

Robotics simulation is how we teach robots to act smart—before they ever touch the real world. It’s physical AI: simulating Newtonian physics so robots can grip, move, and learn. A new simulator is coming to @huggingface @LeRobotHF 🤗. Stay tuned 👀😉

5

178

21

47

35K

Sylvain Lesage @severo_dev

about 1 year ago

Come find the many BERT islands. Or see how datasets relate in practice, not just in theory. See how libraries or tasks can tie repositories together. You can play around with node size using storage/likes/downloads too. The result is a super fun visualization from @saba9 and @znation that I’ve already lost way too much time to. I'm excited to see how the networks grow as we add more repositories! xet-team/repo-graph

0

138

Sylvain Lesage @severo_dev

about 1 year ago

As 📷 xet-team infrastructure begins backing hundreds of repositories on the Hugging Face Hub, we’re getting to put on our researcher hats and peer into the bytes. 👀 🤓 IMO, one of the most interesting ideas Xet storage introduces is a globally shared store of data.

3

2

0

1K

Sylvain Lesage @severo_dev

about 1 year ago

Because of this, different repositories can share bytes we store. That opens up something cool - we can draw a graph of which repos actually share data at the chunk level, where: - Nodes = repositories - Edges = shared chunks - Edge thickness = how much they overlap

severo_dev's tweet photo. Because of this, different repositories can share bytes we store. That opens up something cool - we can draw a graph of which repos actually share data at the chunk level, where:

- Nodes = repositories
- Edges = shared chunks
- Edge thickness = how much they overlap https://t.co/kbbZNzJ7nN

1

0

175

severo_dev retweeted

AshutoshShrivastava

@ai_for_success

about 1 year ago

🚨 SkyReels just launched! The world’s first open-source video generation platform supporting unlimited duration 🔥 All-in-one creator toolkit: - Consistent high-quality video (LoRA ready) - Fast gen, amazing output. - Amazing facial expressions . Plus text-to-film agent handles everything: script, character, storyboards, full AV gen, auto-edit . it's Wild! Step by step tutorial 👇

35

686

117

1K

81K

Sylvain Lesage @severo_dev

about 1 year ago

Need to convert CSV to Parquet? Use https://t.co/8JZIbJ0hTh. It does the job instantly. @cfahlgren1 provides many other tools on his website. Approved and bookmarked!

severo_dev's tweet photo. Need to convert CSV to Parquet?

Use https://t.co/8JZIbJ0hTh. It does the job instantly.

@cfahlgren1 provides many other tools on his website. Approved and bookmarked! https://t.co/BbmrQPeYBQ

0

116

Sylvain Lesage

@severo_dev

Last Seen Users on Sotwe

Trends for you

Most Popular Users