AI & TECH @AiTechHubs - Twitter Profile

about 2 months ago

We’re open sourcing the first document OCR benchmark for the agentic era, ParseBench. Document parsing is the foundation of every AI agent that works with real-world files. ParseBench is a benchmark that measures parsing quality specifically for agent knowledge work: ✅ It optimizes for semantic correctness (instead of exact similarity) ✅ It has the most comprehensive distribution of real-world enterprise documents It contains ~2,000 human-verified enterprise document pages with 167,000+ test rules across five dimensions that matter most: tables, charts, content faithfulness, semantic formatting, and visual grounding. We benchmarked 14 known document parsers on ParseBench, from frontier/OSS VLMs to specialized parsers to LlamaParse. Here are some of our findings: 💡 Increasing compute budget yields diminishing returns - Gemini/gpt-5-mini/haiku gain 3-5 points from minimal to high thinking, at 4x the cost. 💡 Charts are the most polarizing dimension for evaluation. Most specialized parsers score below 6%, while some VLM-based parsers do a bit better. 💡 VLMs are great at visual understanding but terrible at layout extraction. GPT-5-mini/haiku score below 10% on our visual grounding task, all specialized parsers do much better. 💡 No method crushes all 5 dimensions at once, but LlamaParse achieves the highest overall score at 84.9%, and is the leader in 4 out of the 5 dimensions. This is by far the deepest technical work that we’ve published as a company. I would encourage you to start with our blog and explore our links to Hugging Face to GitHub. All the details are in our full 35-page (!!) ArXiv whitepaper. 🌐: Blog: https://t.co/57OHkx0pQW 📄 Paper: https://t.co/Ho2oH2xEAM 💻 Code: https://t.co/6P7UxqOZYA 📊 Dataset: https://t.co/YguIXWm41j 🎥 YouTube: https://t.co/6Fh1Nsk9ei

31

524

81

550

107K

AiTechHubs retweeted

Karan Vaidya

@KaranVaidya6

2 months ago

Okay, @gdb is team CLI all the way. @garrytan thinks MCPs suck. So we hit the streets of SF to see if the city agreed. We posed a simple question: MCP or CLI? - Basically everyone under the age of 35 said CLI - One person said MCP was as bloated as Java - & unsurprisingly, numerous people told us to touch grass Final score- MCP: 3 vs CLI: 17 SF has spoken, and @composio listened. Our universal CLI is now live! Drop your best CLI vs MCP hot take in the comments and we'll send the best ones some very sick gear 👀 Link to try our CLI in the next thread ⬇️

134

1K

310

589

2M

AiTechHubs retweeted

Matthieu ❙❙ ElevenLabs

@matt_elevenlabs

about 1 month ago

To celebrate the launch of @ElevenCreative on X, we’re giving away 111k credits to 3 lucky creators. To enter: Like + follow @ElevenCreative Winners announced on May 6 at 4 PM GMT

matt_elevenlabs's tweet photo. To celebrate the launch of @ElevenCreative on X, we’re giving away 111k credits to 3 lucky creators.

To enter: Like + follow @ElevenCreative

Winners announced on May 6 at 4 PM GMT

359

1K

129

181

54K

AiTechHubs retweeted

Matthieu ❙❙ ElevenLabs

@matt_elevenlabs

about 1 month ago

We just launched ElevenMusic. We've paid out over $11M to voice creators. Now the same model comes to music. Like this post to get the link in your DMs. https://t.co/2U1kN1JdUd

111

1K

65

485

275K