I have to post it again because Iβm so excited about it. I was featured in a video with @cursor_ai and I just canβt stop smiling about it.
https://t.co/j4Z0xnBv0s
The latest State of AI in the Enterprise report from Box is out!
One year ago, 8% of organizations described themselves as advanced or leading edge in AI. Today that figure is 64%. But adoption is not the same as impact. Half of leading edge companies report significant ROI. Among early stage organizations it is just 1 in 9.
We surveyed 1,640 IT decision makers across the US, UK, France, and Japan to find out.
Lots of evidence of huge jumps in capability for Fable across coding (and related) tasks. Itβs also a major jump in accuracy and success in complex knowledge work tasks.
In our Box AI Complex Work Eval, we tested the model against Opus 4.8 and saw huge boosts across almost every industry. For our eval we give the Box AI Agent, using Fable, a set of hard real world knowledge work problems that deal with enterprise documents. Then score how the agent performs the tasks.
The main differentiators for Fable vs Opus 4.8 is that it doesn't take shortcuts on complex reasoning, it gets multi-step calculations right, and it's significantly more consistent across runs. We saw the biggest leaps in Media & Entertainment (78% vs 61%), Technology (81% vs 73%), Financial Services (89% vs 83%), and Healthcare (66% vs 60%).
Here are some specific examples:
* Legal M&A due diligence: On a task reviewing NDA terms against a semiconductor company's contracting policy, Fable correctly identified that a joint-ownership clause violates exclusivity requirements while a liability cap is permitted under a Super Cap exception. Fable scored 100% vs Opus's 78%.
* Healthcare: On a clinical radiology error audit across 12 reports, Fable precisely categorized each error by severity grade and correctly concluded no Grade 3 errors existed. Opus prematurely escalated a case to "major error requiring immediate departmental review" when the evidence didn't support it β Fable 63% vs Opus 41%.
* Media & Entertainment: On a genre profitability projection task, Fable correctly recognized that a 20% Argentine tax deduction was already embedded in the source spreadsheet figures and didn't double-apply it. Opus applied it again on top β a compounding error across 4 genre calculations that took its score negative on the task vs Fable's 74%.
* Retail analytics: On a task analyzing high-growth product articles against an investment benchmark, Fable correctly computed each article's growth rate individually and identified that only 2 of 5 exceeded the threshold. Opus confused "high growth relative to average" with "above the benchmark" β scoring 61% vs Fable's 94%.
* Financial Services: On a 5-year debt facility projection, Fable correctly applied interest to opening balances and used the right capex figure. Opus applied interest to the total facility amount and computed tax from the wrong base β two compounding errors. Fable scored 83% vs Opus's 62%.
* Technology: On a SaaS feature valuation requiring computation of a Feature Value Index across multiple regions, Fable applied the formula correctly and got exact values for the markets. Opus got the arithmetic wrong on multiple criteria β Fable scored 100% vs Opus's 74%.
Overall, huge step change in complex analysis, work that requires analytical reasoning, and deep domain understanding. Fable will be available shortly in the Box AI Studio for customers to build agents with.
This... set up your way to fund your building, or you will be left behind. I donβt normally follow this mindset, but this is it, and itβs only the beginning.
the permanent underclass everyone keeps tweeting about has a start date now: june 23.
that's when anthropic pulls fable 5 off the $20 plan and it goes back to being the most expensive model on the market.
so for the next 13 days, the gap between you and the most insane builders on your timeline is literally $20.
what the people who opened it are already doing:
- handing it the entire project and not just a prompt
- sleeping while it writes the code and its own tests
- waking up to finished work they just have to review
it's the first model where one person with an idea genuinely doesn't need a team.
you've already spent day one of thirteen watching other people use it.
tonight you can hand it the idea you've been sitting on and wake up to the first version of something you've been talking about for years.
You know @LovesTravelStop from the highway. But behind 650 locations and a network of professional drivers is an enterprise content operation processing 1 million SAP documents a month.
By migrating to Box, Love's shifted their IT team from managing infrastructure to building new solutions at scale.
Box now has a markdown editor on the web. Full CLI support. Commenting. Full version history. Box Drive also lets you connect to any desktop client as a mounted drive, so you instantly work with all your files in Claude Cowork, Codex, Obsidian, Cursor, or any other app.
"You can break AI down into 5 tiers." - George Hotz
"Data centers - tier 1, fabs - tier 2, Nvidia/AMD - tier 3, OpenAI/Anthropic - tier 4, and completely worthless things like Cursor and Windsurf, which are tier 5."
"OpenAI and Anthropic will eat all the value from the Cursors and Windsurfs of the world. I argue that the tier 4s (OpenAI/Anthropic) aren't even going to have value."
From his June 2025 appearance, @realGeorgeHotz
really raised eyebrows with this clip when it hit the timeline.
I need Google Docs but just for markdown files.
Multiplayer comments. Syncing resolving comments.
Suggestion mode
Edit mode
Edit history
Maybe some sense of multi edits.
Easy cli access.
I want some kind of LLM workflow tool.
β’ Ability to manage a set of input files (Markdown or similar), plus other general-purpose context.
β’ With real-time collaboration. (And maybe some concept of snapshots or VCS integration.)
β’ And the ability to create/manage a inference workflows and a stored set of prompts.
β’ Access to general-purpose coding agents (and not just chat models).
β’Β Some concept of compiled outputs/inference results (which ideally can be shared externally).
Many projects have this feeling: "there is all this stuff, which I want to process/compute over in this iterated way, with some build artifacts being important/worth saving." GNU Autotools x Notion or something. Is anyone building this?
A personal AI knowledge base works great for one person. The moment it becomes a team workflow, the personal vault starts to break.
Our demo shows what the fix looks like. A shared company brain in Box where a founder updates a product decision, an engineer applies it via @claudeai Code, and a customer success lead generates an onboarding guide via MCP. All from the same governed source of truth.
Agents become the reasoning layer. Box becomes the governed memory layer. Watch here. π
I'm going through the Claude Partner classes... if I jump to the assessment and pass with 100% I shouldn't have to watch the rest of the videos to get my certificate @AnthropicAI .