Today, we're introducing LightOn Console.
⚙️ Three endpoints:
/Parse any documents
/Extract structured data
/Search enterprise knowledge with citations
🔌 Built-in connectors. MCP-ready. Governance enforced at the chunk level.
No infrastructure. No pipeline maintenance. No dedicated retrieval team required.
Make your enterprise knowledge agent-readable now!
Read the launch announcement: https://t.co/LcxXqyOgo5
Test it now: https://t.co/RNJQKEHzQ2
Happy to release a new version of Knowledge my AI knowledge base
I always wanted to search into @karpathy bookmarks or even @ylecun, that's what I did, I built a library for 450+ personalities of AI 😊
I analyzed their public data and built a search engine and recommender system
Releasing ColGREP and LateOn-Code models 🚀
ColGREP is a multi-vector search tool built in Rust made for coding agents. It's an hybrid grep which supports both grep features and semantic retrieval. Run 100% locally.
You get two SOTA code retrieval model within ColGREP
@CavaillesAdrien@baptaubertin@LightOnIO The LightOnOCR preprint is now on arXiv, if you want to learn more about how we built the model from data curation, pretraining to RL, give it a read!
https://t.co/WxZru2KbXv
Lighton released the new sota 1B OCR model with A2.0 license 🔥
> super fast & cheap ⚡️ 5× faster than dots.ocr, 2× faster than PaddleOCR-VL and costs <$0.01 per 1k pages
> detects images, handles various media and layouts
> comes with transformers support day-0! 🤗
🚀 LightOnOCR-2-1B 🦉 is out, a major update to LightOnOCR.
1B parameters, end-to-end multilingual OCR, and it beats models 9× larger on OlmOCR-Bench while being much faster.
PDF/page in, clean ordered Markdown out, with optional image localization (bbox variants).
A Halloween gift 🎃
New finetuning notebook for LightOnOCR-1B:
• Supports both Full and LoRA training
• Supports FineVision 🤗 subsets incl. OlmOCR-mix & handwritten IAM
• ~12 min/epoch on one H100 (can also runs on Colab!)
Looks like @LightOnIO's LightOnOCR is in good company.
Finetunable, state of the art and cheap!
Awesome job by @staghado , @CavaillesAdrien , @baptaubertin and the team.
In-Depth 👇 https://t.co/b9bRznbvfP
You might have seen a lot of OCR release recently...
Here is another one, introducing 🦉 LightOnOCR-1B
A fully end-to-end differentiable VLM model competing with all the latest releases while being much faster🚀