An environmentally-focused technologist who envisions a future in which humanity is better integrated into Earth’s ecosystems.
Principal Architect @Unstructured
The Document AI space has seen a fundamental shift in the past year. Everyone—from scrappy startups to established players—has pivoted from custom supervised models to wrapping the same handful of closed-source multimodal models.
Yet, despite the fact we're all using essentially the same approach and the same models under the hood, there's no shortage of “benchmark triumphs” from Document AI vendors touting the best performance on the market.
I especially find it comical when these vendors compare their product against ours at @UnstructuredIO , and yet instead of comparing their VLM wrapper against our VLM wrapper (which according to our own benchmarks outperforms theirs), they compare it to our free, open source product—a product that doesn't depend on massive, powerful, expensive closed source models. *blink blink* I'm sorry, but that's like comparing public transportation in Rome to driving an Alfa Romeo 4C Spider convertible through the Tuscan hills—they were designed for different intents in mind.
Here’s the truth: when Fortune 500 teams run real head-to-head evaluations—our commercial platform consistently performs on par or better than the best in the business. Month to month, we trade #1 spots with the leaders.
But the bigger problem is this: benchmark theater is costing enterprises greatly. Choosing a vendor that is touted via their own benchmarks as having the best-in-class transform of pdfs, but can't process other document types results in organizations having to build a rats nest of supplemental home-grown capabilities that require management, maintenance, and eventually grows to the point where it needs to be swapped out with a more scalable solution.
Those glossy accuracy charts usually measure PDFs in isolation—while critical data in .docx, .pptx, .eml, .msg, .tiff, .epub, or .xlsx files goes completely unseen.
And what about model fallback, dynamic content-based routing, retries, and all the other features needed to ensure your VLM wrapper actually works at scale?
Finally, let's not forget the factor that when it comes to benchmark performance, most vendors fine-tune (to the point of overfitting) prompts to perform well on major public benchmarks.
At the end of the day, document transformation quality isn’t about cherry-picked metrics. It’s about coverage, fidelity, metadata richness, and mitigating the cost of missed information.
Ready to see what benchmarks look like when they reflect real business impact?
🎙️ Join our deep-dive in next week's webinar on Wednesday, Sept. 10:
Document Transformation Quality Series: Pushing the Boundaries of Document Transformation Quality → Sign up here: https://t.co/8OB8iIvH4M
#DocumentTransformation #BenchmarkTruth #EnterpriseAI #UnstructuredData #DocumentAI #Unstructured #BenchmarkData
At @UnstructuredIO, we often get the question "how well do you perform on scanned forms that include handwriting?"
These types of documents are notoriously among the most difficult types of documents to ingest cleanly and reliably, yet they remain ubiquitous across many industries and are especially prevalent in healthcare, insurance, and similar domains.
Our short answer? Brilliantly. But we encourage you to see for yourself via our free trial! → https://t.co/83noec85wV
Our industry-leading VLM partitioner is designed to tackle the most complex documents generally across all business domains, but it is especially powerful when it comes to scanned, rotated/skewed, and/or handwritten documents. Parsing these documents with less sophisticated parsers results in one or more of the following: strings of jibberish characters due to inaccurate OCR; signatures treated as blobs; form fields lost; checkboxes ignored; marginal notes dropped entirely; or worse.
By leveraging state-of-the-art models and grounding our VLM partitioner in a rich document element ontology, we produce rich, clean parses of these documents, without collapsing the document's structural context:
- Handwritten fields captured as structured inputs with handwriting transcribed
- Checkboxes encoded as checkboxes, not flattened text
- Signatures and logos preserved distinctly
- Page numbers and layout context retained
- Layouts and sections captured
The result: even your most complex, analog-origin documents are parsed into a consistent, auditable structure that downstream systems (data entry, RAG, compliance, analytics) can trust.
See an example below: a scanned, tilted, complex, medical form, filled in by hand with dummy data on the left and our parsed, rendered, stylized HTML on the right. Of course, when VLMs and handwriting are concerned, very few parses will be 100% perfect, but even for complex, messy forms like this, you can often expect very high 90s in terms of both layout and textual content accuracy from our partitioner. This example evaluated at ~98+% for both content and layout accuracy.
Want to learn more? Join us for my upcoming webinar:
Document Transformation Quality Series: Pushing the Boundaries of Document Transformation Quality - https://t.co/8OB8iIvH4M
#DocumentAI #Handwriting #ScannedDocs #VLM #Ontology #DataQuality #ScannedForms
@kristahopsalong Hi @kristahopsalong - would you be open to (paid) consulting on DSPy? We'd like to use it for a big project at Unstructured, but we're encountering a few issues.