Said Taghadouini

@staghado

pixels & tokens @ lighton / vit.cpp, modernbert, lightonocr

Joined January 2024

368 Following

481 Followers

434 Posts

Pinned Tweet

Said Taghadouini @staghado

5 months ago

🚀 LightOnOCR-2-1B 🦉 is out, a major update to LightOnOCR. 1B parameters, end-to-end multilingual OCR, and it beats models 9× larger on OlmOCR-Bench while being much faster. PDF/page in, clean ordered Markdown out, with optional image localization (bbox variants).

staghado's tweet photo. 🚀 LightOnOCR-2-1B 🦉 is out, a major update to LightOnOCR.
1B parameters, end-to-end multilingual OCR, and it beats models 9× larger on OlmOCR-Bench while being much faster.
PDF/page in, clean ordered Markdown out, with optional image localization (bbox variants). https://t.co/MuHOJtQUsF

14

682

108

691

57K

Said Taghadouini @staghado

3 days ago

@AmelieTabatta @LightOnIO 🚀

0

3

0

0

54

staghado retweeted

Amélie Chatelain

3 days ago

Do you like the open-source models we keep shipping at @LightOnIO? 👀 Now you can actually *build* with them!! We're launching LightOn Console 🎮: three endpoints (Parse, Extract, Search) so you can run our models on your own documents without building the plumbing yourself! 🧵

AmelieTabatta's tweet photo. Do you like the open-source models we keep shipping at @LightOnIO? 👀
Now you can actually *build* with them!!

We're launching LightOn Console 🎮: three endpoints (Parse, Extract, Search) so you can run our models on your own documents without building the plumbing yourself!
🧵 https://t.co/UoCWk4ghqL

2

38

10

14

2K

staghado retweeted

4 days ago

Today, we're introducing LightOn Console. ⚙️ Three endpoints: /Parse any documents /Extract structured data /Search enterprise knowledge with citations 🔌 Built-in connectors. MCP-ready. Governance enforced at the chunk level. No infrastructure. No pipeline maintenance. No dedicated retrieval team required. Make your enterprise knowledge agent-readable now! Read the launch announcement: https://t.co/LcxXqyOgo5 Test it now: https://t.co/RNJQKEHzQ2

LightOnIO's tweet photo. Today, we're introducing LightOn Console.

⚙️ Three endpoints:
/Parse any documents
/Extract structured data
/Search enterprise knowledge with citations

🔌 Built-in connectors. MCP-ready. Governance enforced at the chunk level.

No infrastructure. No pipeline maintenance. No dedicated retrieval team required.
Make your enterprise knowledge agent-readable now!

Read the launch announcement: https://t.co/LcxXqyOgo5

Test it now: https://t.co/RNJQKEHzQ2

0

35

15

14

2K

Said Taghadouini @staghado

6 days ago

@gabriberton same thing was in fuyu iirc, with enough compute no need for a vit but vlms starting from one get a head start.

0

0

0

0

116

Said Taghadouini @staghado

8 days ago

@vanstriendaniel qwen3.5 series is strong and fast!

0

3

0

0

115

staghado retweeted

Amélie Chatelain

9 days ago

I still love seeing my old lecture notes as demo images:D

0

11

1

0

370

Said Taghadouini @staghado

9 days ago

@antoine_chaffin 👨‍🍳

0

2

0

0

118

Said Taghadouini @staghado

11 days ago

@adithya_s_k fyi the arabic samples are just garbled chars, something went wrong during rendering 😀

0

0

0

0

47

Said Taghadouini @staghado

11 days ago

character error rate is not a valid metric for full page ocr, thanks!

0

6

0

0

339

Said Taghadouini @staghado

11 days ago

@skalskip92 if qwen3.5 122B-A10B is only 5 tests away from gemini 3,5 then the 397B-A17B should be even closer. plus it completely destroys everyone on speed?

0

0

0

0

244

Said Taghadouini @staghado

16 days ago

@RfK_001 FlexiViT did train with different patch sizes w/ weight sharing. But it was fixed resolution ViTs, i am not sure it's the same if you vary input resolution as they become very similar(either make the image bigger or make the patch smaller).

staghado's tweet photo. @RfK_001 FlexiViT did train with different patch sizes w/ weight sharing. But it was fixed resolution ViTs, i am not sure it's the same if you vary input resolution as they become very similar(either make the image bigger or make the patch smaller). https://t.co/Xccq8qxP3u

1

1

0

0

12

Said Taghadouini @staghado

18 days ago

@hu_yifei ig with thinking(default) it would figure it out easily

1

0

0

0

18

Said Taghadouini @staghado

18 days ago

ppl using this for parsing are cooked!

18 days ago

Gemini 3.5 is 30x more expensive than 1.5

lafaiel's tweet photo. Gemini 3.5 is 30x more expensive than 1.5 https://t.co/APHFmqXYWK

57

2K

92

145

312K

0

3

0

0

217

Said Taghadouini @staghado

18 days ago

@hu_yifei is this one image or two separate images?

1

0

0

0

25

Said Taghadouini @staghado

18 days ago

@hu_yifei is it significantly better than 3/3.1 flash for parsing? tried the 3 today on some adversarial synth data and it was very decent.

1

1

0

0

74

Said Taghadouini @staghado

18 days ago

it still "fixes" some typos, a VLM is a VLM after all!

staghado's tweet photo. it still "fixes" some typos, a VLM is a VLM after all! https://t.co/58MPg4lfrI

0

1

0

0

41

Said Taghadouini @staghado

18 days ago

gemini flash 3, i was not familiar with your game! ps: sample size = 1 for now but good start

staghado's tweet photo. gemini flash 3, i was not familiar with your game!
ps: sample size = 1 for now but good start https://t.co/Y9Zur4ZMY4

1

5

0

0

232

Said Taghadouini @staghado

18 days ago

@antoine_chaffin hahah probably not, nah i think it's fine for prompting purposes!

0

1

0

0

48

Said Taghadouini @staghado

19 days ago

created a simple tool to help me in my daily workflow: hit a hotkey, talk, the transcript pastes itself into Claude Code/Codex/wherever I'm typing. fully local, runs on my mac.

5

10

2

0

549

Said Taghadouini @staghado

19 days ago

try it https://t.co/7yFoPfYHIL

0

0

0

0

78

Said Taghadouini @staghado

19 days ago

running @IBM Granite Speech 4 via llama.cpp @ggml_org. credits to @RfK_001 for adding upstream support for this model.

1

1

0

0

97

Last Seen Users on Sotwe

Trends for you

Most Popular Users