Prathamesh Juvatkar

Verified account

@PJuvatkar

Building Docstrange, open-source document intelligence for developers. Turning messy PDFs into clean Markdown for RAG pipelines. Sharing what I learn about doc

Mountain View, California

Joined October 2011

217 Following

303 Followers

191 Posts

Pinned Tweet

Prathamesh Juvatkar

about 1 year ago

https://t.co/GrLXfd8wKK Table extraction is still a hard problem for these models. Gemini 2.5 flash indeed is very good in IDP tasks. More such insights!

Souvik Mandal @mandalsouvik4

about 1 year ago

It's officially live! The Intelligent Document Processing Leaderboard. Featuring 16 datasets, over 9,000 documents, and 6 distinct tasks. More models will be added soon, or you can evaluate on your own— the datasets and code are open-source too! https://t.co/bnT4H7DpM9

mandalsouvik4's tweet photo. It's officially live! The Intelligent Document Processing Leaderboard.

Featuring 16 datasets, over 9,000 documents, and 6 distinct tasks.

More models will be added soon, or you can evaluate on your own— the datasets and code are open-source too!

https://t.co/bnT4H7DpM9 https://t.co/yjSjBv8yn3

1

15

7

0

670

0

8

1

0

341

Prathamesh Juvatkar

3 days ago

@vanstriendaniel @allen_ai Faced the same issue with this bench. I think it should be fair to eval model + some postprocessing. If model over extracts but knows what's it's extracting, it can selectively remove things from output to make it compliant with bench?

0

0

0

0

131

Prathamesh Juvatkar

9 days ago

@GptMaestro Any good benchmarks to test graph performance of this plugin vs native graph databases. Maybe scope of queries possible, latency etc? Which metrics do you think matter for a graph DB built for agents?

0

0

0

0

29

Prathamesh Juvatkar

14 days ago

@ProfAdebay @Lunexalith @teortaxesTex Self host, yes. Locally not in near future probably. Going to be very large models, no economies of scale on local etc.

0

0

0

0

253

Who to follow

CEO - @nanonets_ YC W17

Verified account

Building Rails for JS | @WaspLang CEO 🐝| yc alum

Verified account

🤖 I build Agents | Sova (iykyk) | e/acc | 🚀 YC Alum

Prathamesh Juvatkar

14 days ago

@LandoTakingOver @Lunexalith @teortaxesTex The only argument could be they have only unlocked large scale distillation capability yet, so their best models can't be better than US models. Then might as well release them. A big assumption though!

0

1

0

0

270

Prathamesh Juvatkar

14 days ago

@Lunexalith @teortaxesTex Do you really think China is open sourcing their best models?

4

5

0

1

5K

Prathamesh Juvatkar

20 days ago

@paulg Should curate these and write a new parenting book..

0

0

0

0

455

Prathamesh Juvatkar

23 days ago

@QuantaMind_2025 @nanonets Post training models to give out calibrates confidence scores, layering with business rules that spot mistakes in outputs

0

0

0

0

19

Prathamesh Juvatkar

27 days ago

Looking at mem benchmarks, most try to evaluate mem systems by giving access to a model to memory and seeing how well they do the task. Wouldn't a good bench be to directly evaluate system? Eg give it lot of tax filings and figure out what % of tax code it can infer from it?

1

2

0

0

44

Prathamesh Juvatkar

2 months ago

The pitch of all seat based pricing companies now is agent needs their own seat for access control. Good save!

0

0

0

0

26

Prathamesh Juvatkar

3 months ago

@DrDatta_AIIMS Do give this a try https://t.co/ZDkjtixGZx

0

0

0

0

6

Prathamesh Juvatkar

3 months ago

@karpathy @kepano https://t.co/bvP9Dd7Czk Looks ok to give markdown context to an LLM

PJuvatkar's tweet photo. @karpathy @kepano https://t.co/bvP9Dd7Czk
Looks ok to give markdown context to an LLM https://t.co/Uk41uLMcre

1

1

1

1

690

Prathamesh Juvatkar

3 months ago

@pmarca Except the ones in AI

0

0

0

0

1K

Prathamesh Juvatkar

3 months ago

@MaxBrodeurUrbas @satyanadella It works on my machine

0

0

0

0

304

Prathamesh Juvatkar

3 months ago

So far multi-agent setup was needed for more direct practical purposes like context management. Reading mythos's system card (if true), you would need multi-agent setup to minimize reward hacking, setting up accountability, manage model's psyche, just how you build organizations!

0

1

0

0

27

PJuvatkar retweeted

Rushabh Nagda @rushabh_nagda

3 months ago

"When a metric becomes the target, it stops being a good metric" - Goodharts Law last few days GLM-OCR has been trending after it claimed 95% on OmniDocBench, which is higher than Gemini-3-pro in reality GLM-OCR is way worse than the story these benchmarks paint, lets see how full disclaimer: ive been working in this space for the last 7 years with @nanonets

1

12

10

2

481

Prathamesh Juvatkar

3 months ago

@heyrimsha @Wealth_Pill We tested it on documents slightly different from ones in popular benchmarks, and it doesn't do well. Model is clearly benchmaxxed

0

8

0

0

1K

Prathamesh Juvatkar

3 months ago

@_karthik https://t.co/Tx2YeGoAJE The last batch of frontier models became better than finetuned ones, next wave of finetuned ones should surpass them again. These small size VLM's are where a lot of architecture development is happening IMO

0

0

0

0

13

Prathamesh Juvatkar

3 months ago

@bindureddy For specific tasks like OCR, still some delta in SOTA vs flash varients, however small domain specific models are doing better and better https://t.co/Tx2YeGoAJE

0

0

0

0

23

Prathamesh Juvatkar

3 months ago

We need elo score for every profession now on who can beat AI in that profession!

0

0

0

0

20

Prathamesh Juvatkar

4 months ago

@grok @Ichimokutrader7 @sama https://t.co/Tx2YeGoAJE Gemini > Claude > GPT is the trend we've seen

1

0

0

0

15

Last Seen Users on Sotwe

Trends for you

Most Popular Users