Raphaël Sourty @raphaelsrty - Twitter Profile

Pinned Tweet

4 months ago

Releasing ColGREP and LateOn-Code models 🚀 ColGREP is a multi-vector search tool built in Rust made for coding agents. It's an hybrid grep which supports both grep features and semantic retrieval. Run 100% locally. You get two SOTA code retrieval model within ColGREP

7

134

19

108

11K

raphaelsrty retweeted

Amélie Chatelain

@AmelieTabatta

about 20 hours ago

Do you like the open-source models we keep shipping at @LightOnIO? 👀 Now you can actually *build* with them!! We're launching LightOn Console 🎮: three endpoints (Parse, Extract, Search) so you can run our models on your own documents without building the plumbing yourself! 🧵

AmelieTabatta's tweet photo. Do you like the open-source models we keep shipping at @LightOnIO? 👀
Now you can actually *build* with them!!

We're launching LightOn Console 🎮: three endpoints (Parse, Extract, Search) so you can run our models on your own documents without building the plumbing yourself!
🧵 https://t.co/UoCWk4ghqL

2

37

10

14

1K

raphaelsrty retweeted

LightOn

@LightOnIO

2 days ago

Today, we're introducing LightOn Console. ⚙️ Three endpoints: /Parse any documents /Extract structured data /Search enterprise knowledge with citations 🔌 Built-in connectors. MCP-ready. Governance enforced at the chunk level. No infrastructure. No pipeline maintenance. No dedicated retrieval team required. Make your enterprise knowledge agent-readable now! Read the launch announcement: https://t.co/LcxXqyOgo5 Test it now: https://t.co/RNJQKEHzQ2

LightOnIO's tweet photo. Today, we're introducing LightOn Console.

⚙️ Three endpoints:
/Parse any documents
/Extract structured data
/Search enterprise knowledge with citations

🔌 Built-in connectors. MCP-ready. Governance enforced at the chunk level.

No infrastructure. No pipeline maintenance. No dedicated retrieval team required.
Make your enterprise knowledge agent-readable now!

Read the launch announcement: https://t.co/LcxXqyOgo5

Test it now: https://t.co/RNJQKEHzQ2

0

35

15

14

2K

raphaelsrty retweeted

Silvio Martinico @SilvioMartinico

2 days ago

The late-interaction multivector retrieval ecosystem is exploding right now. To help separate the signal from the noise, we put together an "Awesome Multivector Retrieval" list organizing the top models, engines, libraries, and datasets all in one place 📚 🧵👇

SilvioMartinico's tweet photo. The late-interaction multivector retrieval ecosystem is exploding right now.
To help separate the signal from the noise, we put together an "Awesome Multivector Retrieval" list organizing the top models, engines, libraries, and datasets all in one place 📚 🧵👇 https://t.co/utXD7YYfp1

5

114

23

109

7K

Who to follow

Adil D. Ztn 👒

@AdilZtn

Founding Research Scientist @UMA_Robots 🦾 I'm trying to make reinforcement learning boring. prev @huggingface 🤗 at @LeRobotHF

louis

@crystabelline

drank the dum dum juice

Ellerbach Maxime

@EllerbachMaxime

EPITA Paris - robotics and deeplearning enthusiast 🇫🇷

Raphaël Sourty

@raphaelsrty

4 days ago

@SilvioMartinico Congrats @SilvioMartinico !

1

2

0

377

raphaelsrty retweeted

Silvio Martinico @SilvioMartinico

4 days ago

Quick update: TACHIOM 0.3.0 is out with mean-centering to help alleviate the anisotropy problem. Also noticed that newer models usually need lower micro/small token thresholds than the defaults calibrated on ColBERTv2.0. More to come soon! ⚔️

1

21

4

2

2K

raphaelsrty retweeted

Antoine Chaffin

@antoine_chaffin

5 days ago

It’s only BEIR but there are almost 10 points gap between v2 and LateOn We also have good evidence that the model generalize very well outside of BEIR GTE-ModernColBERT was an upgrade LateOn is a whole new generation And all of them have the exact same usage in PyLate

antoine_chaffin's tweet photo. It’s only BEIR but there are almost 10 points gap between v2 and LateOn
We also have good evidence that the model generalize very well outside of BEIR

GTE-ModernColBERT was an upgrade
LateOn is a whole new generation
And all of them have the exact same usage in PyLate https://t.co/qfxH43r9Iv

3

37

7

20

5K

Raphaël Sourty

@raphaelsrty

5 days ago

At 140 million parameters, our LateOn model yield strong results 😉 Unrelated to LateOn, I'm really excited by what's happenning with multi-vector models right now - New kind of indexes running on cpu - New multilingual models - Anisotropie being solved - Sparse multi-vector

Omar Khattab

@lateinteraction

5 days ago

20M downloads / month is a new record for colbertv2 but people should probably migrate from this ancient October 2021 model to the LateOn colbert model from @raphaelsrty @antoine_chaffin et al (@LightOnIO)

lateinteraction's tweet photo. 20M downloads / month is a new record for colbertv2

but people should probably migrate from this ancient October 2021 model to the LateOn colbert model from @raphaelsrty @antoine_chaffin et al (@LightOnIO) https://t.co/39XZ3vWUGY

11

131

9

45

17K

0

43

4

19

5K

Raphaël Sourty

@raphaelsrty

5 days ago

@bclavie Awesome work, I did catch up with SAE and multi-vector, this is really cool

0

1

0

211

raphaelsrty retweeted

Ben Clavié

@bclavie

5 days ago

Very excited to finally share this one after sitting on it for far too long! It's very topical now. Blog post coming very soon :)

9

88

15

23

13K

raphaelsrty retweeted

Omar Khattab

@lateinteraction

5 days ago

Late-interaction sparse retrieval? 😁 With neuron-level inverted indexing, on top of unsupervised sparse autoencoders. Works much better than directly training sparse retrievers. Lots of cool ideas developed & composed in here. Thanks for the insights @Veritas2026 @yifeiwang77!

lateinteraction's tweet photo. Late-interaction sparse retrieval? 😁

With neuron-level inverted indexing, on top of unsupervised sparse autoencoders. Works much better than directly training sparse retrievers.

Lots of cool ideas developed & composed in here. Thanks for the insights @Veritas2026 @yifeiwang77! https://t.co/tPf2Mohuy9

9

175

16

107

28K

Raphaël Sourty

@raphaelsrty

6 days ago

@antoine_chaffin Oh cool, did not see it on our hf, nice, I could have sent a slack message instead of a tweet 😁

0

8

0

206

Raphaël Sourty

@raphaelsrty

6 days ago

I want an Iso-LateOn as well 😁 Very interesting work to scale multi-vector retrieval and fight anisotropism in models so it can produce sparse vectors for SMVE

topk.io

@topk_io

6 days ago

Even strong multi-vector models may break down when optimized for low-latency and high-QPS inference in production. But this can be fixed. We're open-sourcing Iso-ModernColBERT, a late interaction model built for efficient inference and scalable retrieval. 🧵 (1/6)

topk_io's tweet photo. Even strong multi-vector models may break down when optimized for low-latency and high-QPS inference in production. But this can be fixed.

We're open-sourcing Iso-ModernColBERT, a late interaction model built for efficient inference and scalable retrieval.

🧵 (1/6)

1

56

9

41

10K

2

21

1

1K

raphaelsrty retweeted

topk.io

@topk_io

6 days ago

Even strong multi-vector models may break down when optimized for low-latency and high-QPS inference in production. But this can be fixed. We're open-sourcing Iso-ModernColBERT, a late interaction model built for efficient inference and scalable retrieval. 🧵 (1/6)

1

56

9

41

10K

Raphaël Sourty

@raphaelsrty

6 days ago

@aussetg Feel free to push a MR, could be interesting, even a draft one. At some point we could make colgrep cache friendly for kernel given a specific backend ofc we do want to integrate a backend that brings something to the table but it cost nothing to push a draft MR :)

1

0

16

Raphaël Sourty

@raphaelsrty

6 days ago

@Robro612 Thank you @Robro612 for spotting this ☺️ ! Also @paulomouraj spotted a memory issue with very long queries (browsecomp+ like queries) which we will fix asap

1

4

0

289

raphaelsrty retweeted

Rohan Jha @Robro612

6 days ago

ICYMI: @raphaelsrty just added index.freeze() to FastPlaid v1.4.7 which halves your size on disk if you know you won’t modify the index 🥶 Reversible with index.unfreeze() 🔥

1

16

3

1

1K

Raphaël Sourty

@raphaelsrty

7 days ago

@capemox Very cool, willing to learn from your experiments even at reduced scale ! ✌️

0

1

0

17

Raphaël Sourty

@raphaelsrty

7 days ago

@bclavie lgtm

0

2

0

230

raphaelsrty retweeted

Clément Chadebec

@CChadebec

7 days ago

📢 New @heyjasper release ! 📢 MONET 🌸 : An Apache2.0 deduped and recaptioned dataset of 105M samples unlocking reproducible text-to-image research. Nano T2I 🖌️ : A codebase to train your own T2I model 🤗 @huggingface: https://t.co/x6gEhQIaFV 💻: https://t.co/K6VIU2wjtW Very excited about this new release, pushing the boundaries of open and reproducible T2I research. Congrats to the team! Benjamin Aubin Gonzalo Quintana @onurxtasar @UlaLaParis @_jeev2 @dh7net @clipdropapp @heyjasperai

9

117

33

90

45K

Raphaël Sourty

@raphaelsrty

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users