Today we're publishing LongExtractBench, a benchmark commissioned by @reductoai and independently validated by micro1.
We evaluated seven production document extraction systems across the same 225 complex enterprise documents. The benchmark was intentionally difficult: documents averaged 358 pages and contained roughly 88,700 ground-truth fields each. Every system was evaluated using the configuration documented in the benchmark methodology.
Key findings:
• Reducto Deep Extract was the only system to successfully complete all 225 documents.
• Direct frontier LLM baselines achieved substantially lower completion rates on long, complex documents.
• In this benchmark, dedicated extraction platforms achieved higher completion rates than the direct frontier LLM baselines.
• Recall was the clearest differentiator. Precision remained high across systems, but recall ranged from 33.8% to 99.6%, highlighting which systems consistently captured the information contained in long, complex documents.
The full report includes the benchmark methodology, limitations, and reproducibility resources. Check out the report and results in the comments below.