Long array extraction is a core capability we have invested a lot of time in even since the early days at Extend. It's a very challenging problem that is far from just a model problem, you need a purpose built harness that enables foundation models to reliably extract 1,000s of data points over hundred to get the most out of any top model.
Our MAX mode for extraction does exactly that through a combination of things like:
- Dynamic chunking of large documents based on table sizes/density and schema complexity, with semantic preservation as much as possible
- Multiple passes through the full document to make sure all split context is persisted across the extraction over a long document
- Heavy usage of smaller models used to detect and fix mechanical issues around a variety of page and section boundary conditions
All this together brings us closer to the end goal of *perfect* extraction over any sized document and schema complexity. The most exciting part though is this is using a system we built and launched months ago, we're now working on a v2 that will take it to another level of complexity handling, stay tuned ๐
we created a new, open source eval (LongArray-Extract) for one of the hardest problems in document processing: how to extract every row out of long documents
some highlights:
- Extend's array extraction is SOTA (99.2%)
- 3x faster than the next closest competitor (5 min vs 14 min)
it's based on examples we've seen in production:
> bank statements with 2,000+ transactions
> clinical adverse-event listings with 1,000+ events
> legal filings with hundreds of numbered factual paragraphs
if you've ever built a document pipeline on hundred page docs with thousands of listings, you know exactly how quickly things break
we open sourced the benchmark + dataset so teams can inspect the docs, run the harness, and compare results directly
@rabois@the_P_God It was much more @garrytan via a combination of YC and local political initiatives that saved SF.
OpenAI was very impactful, but alone would not have led this turn around
There are many things I love about our new site, but my favorite is the animation all the way at the bottom for those interested enough to scroll to the end
proud to share Extend's updated brand and website! we spent 100s of hours on it, and obsessed over every single detail
multiple full redesigns thrown out, fonts swapped and swapped again, every animation tweaked until it felt right...we even locked ourselves in a room and white-boarded every single word on the page until it resonated
why? prospects would see a demo and say something like "this is not what I expected based on your site", and they were 100% correct
our product has changed so much in the past year, that we wouldn't even recognize the old version (new APIs, capabilities, entirely new categories of problems we now solve). Our old site didn't reflect any of that.
weโre proud of how this turned out, and we hope it conveys the level of craft our team obsesses over in everything we ship
check out the new site below and please share feedback!
There were actually a number of internal evals we have at @ExtendHQ that specifically gpt4-0314-32k outperformed on against all subsequent gpt4 models, and claude 3x models, until gpt-4o-0806 and Opus 3.7
Truly a special model. Many did not realize how far you could push it on complex extraction tasks.
Iโll also never forget my fist few times interacting with Bing Creative Mode, which to my understanding, was built on top of gpt4-0314
One takeaway that stuck with me from my session with Eli Badgio at @ExtendHQ: Document processing isnโt a prompt problem, itโs a pipeline problem.
Getting to 99%+ accuracy means optimizing every step end to end, not just writing better prompts.
Notes from the session below.
Introducing Composer โ the first AI Agent for document processing.
Get to production-grade accuracy, autonomously in minutes.
In our early beta, some teams hit 99% accuracy on complex document tasks in under 10 minutes.
Composer is an agent built to optimize schemas the same way a human would (but way faster).
Instead of tuning prompts by hand, you point Composer at your eval set inside Extend.
Composer will:
- analyze where your schema falls short
- propose targeted improvements
- run multiple experiments in parallel
- surface diffs, accuracy gains, and traces behind each change
With this launch, Extend is the only product on the market that helps you reach production-grade accuracy this fast.
Composer is live for all Extend customers today! Try it out at the link in comments below.