Arth Bohra @ArthBohra - Twitter Profile

about 9 hours ago

every company wants to be #1 in their own benchmark worked with @micro1_ai to have an independently validated benchmark huge s/o @ArthBohra @donaldwu_ and the rest of the team in making this happen

sidpagariya's tweet photo. every company wants to be #1 in their own benchmark

worked with @micro1_ai to have an independently validated benchmark

huge s/o @ArthBohra @donaldwu_ and the rest of the team in making this happen https://t.co/lEvEcWxaoj

1

9

3

0

194

ArthBohra retweeted

Adit

@aditabrm

about 9 hours ago

Many companies are #1 in a benchmark they crafted. We worked with @micro1 to create an independently audited benchmark to measure document extraction performance with long documents. The results of LongExtractBench show the nuances companies are likely to find in the real world. micro1 tested frontier models with max reasoning and document processing platforms with their strongest configurations, and found notable precision/recall and completion tradeoffs across most. Reducto’s Deep Extract leads the industry by a wide margin. 🧵

aditabrm's tweet photo. Many companies are #1 in a benchmark they crafted.

We worked with @micro1 to create an independently audited benchmark to measure document extraction performance with long documents.

The results of LongExtractBench show the nuances companies are likely to find in the real world. micro1 tested frontier models with max reasoning and document processing platforms with their strongest configurations, and found notable precision/recall and completion tradeoffs across most.

Reducto’s Deep Extract leads the industry by a wide margin. 🧵

11

101

20

22

27K

ArthBohra retweeted

Raunak

@raunakdoesdev

about 9 hours ago

everyone is #1 on their own benchmark. really grateful to @micro1_ai for independently sourcing the real world data and manually correcting the ground truth to produce this evaluation. one surprising takeaway: you cannot take reliability for granted. none of the other benchmarked systems achieved > 95% completion rate. that's a difference you feel in production.

raunakdoesdev's tweet photo. everyone is #1 on their own benchmark.

really grateful to @micro1_ai for independently sourcing the real world data and manually correcting the ground truth to produce this evaluation.

one surprising takeaway: you cannot take reliability for granted. none of the other benchmarked systems achieved > 95% completion rate. that's a difference you feel in production.

1

33

4

2

3K

Arth Bohra @ArthBohra

about 9 hours ago

@hu_yifei thanks for the amazing mentorship!

0

4

0

40

ArthBohra retweeted

Yifei Hu

@hu_yifei

about 10 hours ago

I understand that "99.6%" feels benchmaxx'ed, but we really tried to optimize the pipeline very hard and put accuracy as our top priority. It's still not perfect because in a production system "0.4%" still means you need human in the loop to QA the results. We will keep improving it. Huge shout out to @ArthBohra and the team. They iterated this for months and turned a cool demo into a reliable production tool!

2

18

2

4

2K

ArthBohra retweeted

micro1

@micro1_ai

about 10 hours ago

Today we're publishing LongExtractBench, a benchmark commissioned by @reductoai and independently validated by micro1. We evaluated seven production document extraction systems across the same 225 complex enterprise documents. The benchmark was intentionally difficult: documents averaged 358 pages and contained roughly 88,700 ground-truth fields each. Every system was evaluated using the configuration documented in the benchmark methodology. Key findings: • Reducto Deep Extract was the only system to successfully complete all 225 documents. • Direct frontier LLM baselines achieved substantially lower completion rates on long, complex documents. • In this benchmark, dedicated extraction platforms achieved higher completion rates than the direct frontier LLM baselines. • Recall was the clearest differentiator. Precision remained high across systems, but recall ranged from 33.8% to 99.6%, highlighting which systems consistently captured the information contained in long, complex documents. The full report includes the benchmark methodology, limitations, and reproducibility resources. Check out the report and results in the comments below.

micro1_ai's tweet photo. Today we're publishing LongExtractBench, a benchmark commissioned by @reductoai and independently validated by micro1.

We evaluated seven production document extraction systems across the same 225 complex enterprise documents. The benchmark was intentionally difficult: documents averaged 358 pages and contained roughly 88,700 ground-truth fields each. Every system was evaluated using the configuration documented in the benchmark methodology.

Key findings:
• Reducto Deep Extract was the only system to successfully complete all 225 documents.
• Direct frontier LLM baselines achieved substantially lower completion rates on long, complex documents.
• In this benchmark, dedicated extraction platforms achieved higher completion rates than the direct frontier LLM baselines.
• Recall was the clearest differentiator. Precision remained high across systems, but recall ranged from 33.8% to 99.6%, highlighting which systems consistently captured the information contained in long, complex documents.

The full report includes the benchmark methodology, limitations, and reproducibility resources. Check out the report and results in the comments below.

20

99

29

18

12K

Arth Bohra @ArthBohra

7 days ago

@dhrvji Congrats Dhruv 🐐🐐

0

1

0

78

ArthBohra retweeted

Dhruv Gautam @dhrvji

7 days ago

life update :)

14

63

4

8K

ArthBohra retweeted

Adit

@aditabrm

14 days ago

https://t.co/Q6C9v1Ichr

1

63

7

55

46K

ArthBohra retweeted

Donald

@donaldwu_

25 days ago

CTO started looking over my shoulder when I was coding

0

42

4

2

2K

Arth Bohra @ArthBohra

26 days ago

@donaldwu_ @joshnkeezy 😭😭

0

23

Arth Bohra @ArthBohra

27 days ago

@deepmatmul Excellent write up 🐐🐐

0

68

ArthBohra retweeted

Karan Brar

@deepmatmul

27 days ago

https://t.co/Hv1ua6nKGX

3

31

9

15

11K

ArthBohra retweeted

Ashwin @ashchirum

about 1 month ago

If these estimates from McKinsey hold true, we will be spending approx $7T for ~216 GW of incremental compute by 2030. For us to keep pace with this unprecedented buildout, walking down the full supply chain picture gets pretty insane

ashchirum's tweet photo. If these estimates from McKinsey hold true, we will be spending approx $7T for ~216 GW of incremental compute by 2030. For us to keep pace with this unprecedented buildout, walking down the full supply chain picture gets pretty insane https://t.co/gyz4GlIzAd

1

0

100

Arth Bohra @ArthBohra

about 1 month ago

@joshnkeezy @reductoai @kolofsen @vedantvyas @jeanghislainbil what a lovely weekend

0

2

0

45

ArthBohra retweeted

Raunak

@raunakdoesdev

about 1 month ago

https://t.co/htilIes0SD

2

130

17

273

32K

Arth Bohra @ArthBohra

about 2 months ago

@AgamGup @DhruvAhuja2003 levels of thought leadership here

0

47

Arth Bohra @ArthBohra

about 2 months ago

@ashchirum @vibhayellamraju You already have the better tweets

1

0

97

Arth Bohra @ArthBohra

2 months ago

@ycombinator @trykinect Congrats this is amazing!

0

1

0

77

ArthBohra retweeted

Y Combinator

@ycombinator

2 months ago

Kinect (@trykinect) turns every e-commerce store into an AI-powered storefront that actually sells. As customers shop, online shopping assistants leverage what each customer is looking for in the moment, adapts to every visitor in real time, captures buying intent data they’ve never had before. Congrats on the launch, @Kratik_ag & @VarunKand! https://t.co/6jOPPs9sUx

74

308

50

196

94K

Arth Bohra

@ArthBohra

Last Seen Users on Sotwe

Trends for you

Most Popular Users