probably as defined by the frontier today. just like computers in 1980, the things we can do with them today were hard to fathom back then. the datacenters (as well as our devices) will be powering incredible possibilities that we can hardly consider now. Your machines will be building incredible things, in incredible places.
A lab releases a new model, scores impressively on major benchmarks, "is this AGI??!!", but they still can't understand PDF's.
Enormous upside remains for real-world utility and for enterprises to make the most of these tools.
🎤 Who run the world? 🎤
Gir—PDFs. PDFs run the world.
This week, we launched GDP.pdf: a new, expert multimodal reasoning benchmark.
We've spent years measuring AI against the extraordinary: proving theorems, solving AGI.
But the global economy doesn't run on the extraordinary.
It runs on paperwork.
More precisely: unsexy, poorly scanned, densely formatted PDFs. Contracts, invoices, medical records, blueprints – the documents that underlie everything we do in the enterprise.
So GDP.pdf tests frontier models on their ability to handle real-world documents across ten professional industries:
🏗️ Construction: Can a model measure load-bearing walls on a blueprint?
⚖️ Law: Can it parse liability caps in a commercial lease?
💵 Finance: Can it calculate margin profiles in a buy-side memo?
Every frontier model scored under 15%.
With GDP.pdf, we wanted to ask: if a $100B model can’t accurately reason about a drug interaction table in a PDF, is it actually ready to take over the economy?
Right now, the answer is no.
Check out the blog post and leaderboard below!
Blog: https://t.co/0d6fmnoCkb
Leaderboard: https://t.co/I1fL5Kcu2o
If you work in tech and are also a prepper, people really read into your actions as a signal for the end times.
I am by no means dictating any AI trends, but I am the resident 'AI tech member of the family', akin to your grandparents asking you to fix their tv because you 'work with computers'. I gifted a few members of my family some spear tips, and of all things, I was not expecting their reaction to prompt conversations about AI safety.
I don’t know if it’s a trump domino as much as each country has a sovereign strategy for this new paradigm shift in AI, and either realizing they want that to apply that to the internet/social networks broadly, or realizing they can get away with it now. Kind of a parallel to the globalization—> nationalism/protectionism trend
@the_auburncreed Two main reasons I play less golf than I want to, and why this could be interesting:
- little kids makes it difficult to commit to a half day activity as often
- temperature - peak summer in Austin is brutal. Everyone goes after the early tee times. This opens up the day.
@AmericanAir@ATT Great benefit! But with no more direct AUS-SFO flights between you and your partners, I'm doing a status match on other airlines for the first time in my career.
@Romy_Holland actually Italians love babies. They come out of the restaurants to flirt and get them to smile, and will bring them bread and mozzarella to chew on. If you go to NYC people ask you why you brought a baby to NY.