PDFs suck.
If you want to build standout RAG apps, you need tooling that is tailored to your needs.
You should own your doc parsing infra.
Today we’re launching https://t.co/rm96AFicMJ on @ycombinator
https://t.co/MrEuBWBBvo
chunkr-layout-1 and our other models are live now on our API!
🌐 Website: https://t.co/rm96AFicMJ
📷 Hugging Face Dataset: https://t.co/k1RQmt4yZf
📷Blog: https://t.co/V2BosKgY4n
Today we're thrilled to announce the release of our newest layout-analysis model: chunkr-layout-1
Trained on millions of the hardest documents, across all verticals, chunkr-layout-1 is built to understand real documents, not just the clean ones.
Introducing chunkr-layout-1. Layout is where document intelligence starts, we just leveled it up.
- Benched on 1,013 hand-tagged samples.
80.9 mAP@50 | 88.4 P | 85.6 R | 86.9 F1
- Beats AWS, Azure, Docling, Gemini 2.5 Pro, OSS models
- Identifies 16 class labels
Live now!
I’m thrilled to announce our new VLMs, chunkr-parse-1 and chunkr-parse-1-thinking.
- Parses complex forms, tables & more
- Inline OCR (redlining and formulas)
- Multilingual (>100 languages)
- Beats AWS Textract, Gemini 2.5 Pro, Mistral at OCR
Live now on the Chunkr API
hello!
we are hiring @chunkrai
we are building best-in-class tools to help make the most out of documents. we're growing fast and are looking for a founding engineer to join!
if you
- like rust
- can handle features e2e
- are funny
lets talk!
https://t.co/kgVfcm6u00
on weekends we bug bash.
the reception of our new excel parser has been amazing. after getting some feedback we're releasing a new, faster, more stable version.
enjoy!
Recently, we launched a parser that breaks up spreadsheets into clean python objects.
We used it to analyze the DoE’s 2020 Congressional Action Report. Here’s the data loaded via Chunkr vs openpyxl, huge difference.
If you’ve ever tried to read an excel file into python, you’d know how cancerous it is.
It becomes really easy to analyze data when you're able to read spreadsheets cleanly.
Excels are notoriously hard to parse.
Even though they are machine readable, they end up as a flat string that loses semantic meaning of tables, styling and formulas. Agents can’t use it.
We’ve solved that.
Raw spreadsheets to perfect HTML, live now on the @chunkrai API.