Nikhil Kumar

@nkumar1103

Joined June 2026

23 Following

2 Followers

2 Posts

nkumar1103 retweeted

micro1

@micro1_ai

about 7 hours ago

Today we're publishing LongExtractBench, a benchmark commissioned by @reductoai and independently validated by micro1. We evaluated seven production document extraction systems across the same 225 complex enterprise documents. The benchmark was intentionally difficult: documents averaged 358 pages and contained roughly 88,700 ground-truth fields each. Every system was evaluated using the configuration documented in the benchmark methodology. Key findings: • Reducto Deep Extract was the only system to successfully complete all 225 documents. • Direct frontier LLM baselines achieved substantially lower completion rates on long, complex documents. • In this benchmark, dedicated extraction platforms achieved higher completion rates than the direct frontier LLM baselines. • Recall was the clearest differentiator. Precision remained high across systems, but recall ranged from 33.8% to 99.6%, highlighting which systems consistently captured the information contained in long, complex documents. The full report includes the benchmark methodology, limitations, and reproducibility resources. Check out the report and results in the comments below.

micro1_ai's tweet photo. Today we're publishing LongExtractBench, a benchmark commissioned by @reductoai and independently validated by micro1.

We evaluated seven production document extraction systems across the same 225 complex enterprise documents. The benchmark was intentionally difficult: documents averaged 358 pages and contained roughly 88,700 ground-truth fields each. Every system was evaluated using the configuration documented in the benchmark methodology.

Key findings:
• Reducto Deep Extract was the only system to successfully complete all 225 documents.
• Direct frontier LLM baselines achieved substantially lower completion rates on long, complex documents.
• In this benchmark, dedicated extraction platforms achieved higher completion rates than the direct frontier LLM baselines.
• Recall was the clearest differentiator. Precision remained high across systems, but recall ranged from 33.8% to 99.6%, highlighting which systems consistently captured the information contained in long, complex documents.

The full report includes the benchmark methodology, limitations, and reproducibility resources. Check out the report and results in the comments below.

10K

nkumar1103 retweeted

adel 🌟

@adelwu_

19 days ago

so who's down for a STARTUP WORLD CUP?? ⚽ 7 teams of 7, custom jerseys, and pride to your name comment if you can beat team @reductoai @ArthBohra @joshnkeezy @abhiarya @rishi_srihari @vedantvyas

Nikhil Kumar

@nkumar1103

Last Seen Users on Sotwe

Trends for you

Most Popular Users