THE TOKEN HANGOVER
@matanSF (Matan Grinberg), CEO and co-founder of @FactoryAI , interviewed by @HarryStebbings (@20vcFund )
This is a special for me since I've been an investor in @FactoryAI since their seed round, and think Matan is a very very special founder.
Summary: Grinberg argues the next 24 months in enterprise AI are a resource-allocation problem: tokens, dollars, and people. Most CIOs are now waking up to bills they cannot justify. The fix is to spend frontier tokens only on the 10-20% of work that requires planning intelligence, run the other 80-90% on open models, and rebuild teams around load-bearing polymaths who own business outcomes. The single-frontier-monopoly fear is fading: four roughly-equivalent labs is the emerging reality, which puts pricing power back in the application layer.
1. The Token Hangover. Enterprise AI adoption ran through three phases this year: boards yelling at CEOs about AI strategy, "token maxing" with AI usage written into perf reviews, and now the morning-after bill. One CIO Grinberg spoke to was spending hundreds of thousands of dollars a month on engineers asking Opus 4.8 things like "how's it going" and "what are my macros from lunch." The frontier model became the default surface for every question, no matter how trivial. Phase 3 is the moment routing matters: every call to a frontier model needs to earn its price.
2. Resource Allocation Is the Job. For the next 24 months every C-suite is solving the same problem: how to allocate dollars, tokens, and headcount against business outcomes. Engineering teams used to be judged by features shipped per quarter, a metric with no link to revenue, market share, or retention. A logistics company adding more engineers to ship more features was always solving the wrong problem; AI made the misallocation visible. Tie every person's work to the metric that actually moves the business, then re-allocate.
3. Load-Bearing Individuals. The "10x engineer" frame measures lines of code, the wrong unit. Grinberg's unit is the load-bearing individual: the person whose absence breaks something. With AI the load-bearing few compound roughly 10,000%; the others get close to nothing, so any org enforcing one token-spend-per-engineer number is painting with too wide a brush. Average token spend per engineer will land on the same order of magnitude as their salary within three years, with a wildly bimodal distribution.
4. Frontier for Decisions Only. 80-90% of software development tasks can run on open models; the remaining 10-20% is planning, where the frontier still wins. This mirrors how human orgs work: leadership is a tiny share of total hours but decides the company's fate. The ego trap is engineers assuming their work is too important for an open model. The router decides better than the engineer, and the cost curve falls only if you wire the routing.
5. The Kirkland Mistake. Kirkland & Ellis announced a $500M, five-year internal AI build, which Grinberg reads as validation for Harvey rather than a threat. Building AI is not a law firm's core competency, and Kirkland's spend will teach them how hard it is. The general rule: just because you can build it does not mean you should, and the discipline is naming the few things you and your team own end-to-end. Outsource everything else, even when you technically know how to do it yourself.
6. Model-App Separation. When the model provider also sells the app, the incentives split: an API business wants you to spend more tokens. A healthy market keeps the application layer independent, so model providers compete on price, speed, and quality every week. Enterprises do not want to vendor-lock again; every CIO carries scars from the cloud era's three-year discount-then-jack-the-price trap. The application layer survives precisely because it forces that competition.
7. Sales as Product. Name a legendary company with a weak sales or marketing team. You can't. The Silicon Valley fallacy that research sits at the top and sales is "dirty work" produces companies that win the gold rush and then collapse when gravity returns. At Factory, engineers and salespeople sit intermixed; when sales closes, engineering says "we closed"; when engineering ships, sales says "we shipped." Atrophied sales muscles will not regrow once enterprise buyers stop saying yes to everything.
8. Polymath Era. Da Vinci, Newton, Euler could be polymaths because their fields were shallow. By the 2010s a theoretical physicist needed 50 years to reach the frontier before contributing anything new. AI collapses that catch-up time, so one person can push forward developer marketing, token-caching infrastructure, and solution engineering at once. The engineer of the future is a GM who owns marketing copy, product metrics, and sales enablement.
9. Build the Factory. Factory's name is literal: engineers in the next era design the assembly line that produces software. The DevX investments that used to scale linearly with headcount (good docs, CI/CD, linters, pre-commit hooks) now scale with the number of agents you run, which is 10x or 100x larger. Every dollar spent making agents production-ready compounds against thousands of PRs a week. Humans move up the stack, from writing code to designing the system that writes code.
10. Seal Team Six. Mandating beds in the office is a hiring failure dressed up as commitment. Grinberg's image: a basketball game judged by who sweat the most, when the scoreboard is what counts. Factory bought eight sleeps for all 30 team members at the time, because recovery is where the gains come from when work requires every ounce of brain power. If your load-bearing engineer can do their best work on two hours of sleep, they were not doing load-bearing work in the first place.
11. Four Frontier Labs. Grinberg's biggest mind-change this year: a single dominant model is unlikely, and four roughly-equivalent frontier providers is the more probable steady state. That outcome is the win for humanity. A one-lab monopoly was the dangerous scenario, and four equivalent labs is also the structural bull case for the application layer because it forces real ongoing price competition. Every CIO Grinberg meets has already decided not to throw their lot in with a single provider.
12. Dario's Self-Serving Doom. "AI will take your jobs" was the pitch that helped raise hundreds of billions, and Grinberg thinks it damaged public psychology and fed the slow-AI lobby. Watch the rhetoric flip at IPO: humans will suddenly become important again, because humans are the ones buying the stock. Founders who never needed to raise that money, like Zuckerberg and Hassabis, never made that argument. Incentives drive the labor-displacement rhetoric more than philosophy does.
Really excited to open source a new project: Omnigent, a meta-harness for AI agents.
It lets you build multi-agent coding and custom agents, sitting above Claude Code, Codex, Pi, and agent SDKs to let you compose them. It also adds live collaboration and rich control policies.
@Yuchenj_UW Yes, MTS has been there for a while. Also, in early days of Yahoo, everyone was a Technical Yahoo (No publicly visible levels) I still remember how liberating it was as a fresh engineer to speak your mind in technical discussions without being bogged down by levels
Oracle has spent the last two weeks writing articles comparing Oracle (and PDB) to Lakebase, and it highlights a massive philosophical divide in how we view databases in the agentic era.
They are trying to retrofit heavy, traditional architectures for AI. We believe Lakebase are the future because agents need something entirely different:
⚡️ Super simple APIs: so agents don't have to read a giant manual and hallucinate a query.
⚡️ Sub-second provisioning & auto-scaling: so you aren't paying legacy-level prices for idle time.
⚡️ Branching: Git-style branching to create isolated, safe environments for agents on the fly.
⚡️ Automatic backup & restore: so you don't sweat it when an autonomous agent inevitably drops a table.
The numbers speak for themselves. Lakebase is our fastest growing product. In the last few months alone, we've seen database start rate 30X, and now we are starting tens of millions of databases EVERY DAY. Some of these databases have 500 level deep branches and lifetime of just seconds due to how fast agents move.
Go try it yourself in a few seconds on https://t.co/ne9Tv18JhV!
The team has been cooking hard to push this gap even further. Come to Data and AI Summit next month to hear about some major new breakthrough capabilities. 🚀
(Links next so you can read their take)
Today is a big day for @SocketSecurity. We just raised a $60M Series C at a $1B valuation, led by @ThriveCapital with participation from @a16z, @AbstractVC, and @CapitalOne Ventures. Total funding is now $125M.
Four years ago, we started Socket because open source dependencies were flowing into production faster than anyone could vet them. AI has massively accelerated that. Code is being written, shipped, and deployed before any human reads it. Security has to operate at that same speed.
One data point from Thrive's diligence that I keep coming back to: they first discovered Socket because @cursor_ai, @OpenAI, and @AnthropicAI all independently told them it was the most important security tool they'd adopted for AI-driven development. Three of the most sophisticated AI companies converging on the same vendor unprompted.
Since our Series B, Socket has grown to more than 20,000 organizations, protecting over 1.5 million repositories and blocking more than 1,000 supply chain attacks every week. The team is now over 100 people.
Three out of five FAANG companies are Socket customers. So are the companies building the most ambitious AI products: @AnthropicAI, @cursor_ai, @xai, @figma, @vercel, @Replit, @scale_AI, @GustoHQ, @Mercadolibre, and @cribl_io, alongside Fortune 100s in financial services and global media.
What we've shipped since the last round:
• Socket Firewall blocks malicious packages at install time, before they reach a developer's laptop or CI pipeline. Free for everyone.
• Reachability analysis via our acquisition of Coana, eliminating 50-80% of irrelevant vulnerability alerts by focusing only on CVEs that are actually exploitable.
• Socket Certified Patches for remediating exploitable CVEs in seconds without waiting on upstream maintainers.
• Coverage extending to browser extensions, editor extensions, MCP servers, and AI tools via our acquisition of @secureannex.
When the Axios compromise hit, our detection systems flagged the malicious dependency within six minutes. Within 24 hours, more than 2,000 organizations onboarded to Socket to block it.
Where the funding goes: deeper investment in Firewall, massively expanding Certified Patches, moving protection closer to every point of install across the developer toolchain, and new product launches pushing Socket into a category we haven't entered before.
We're hiring across engineering, sales, customer success, and threat intel.
❤️ Thank you to our customers, investors, and the open-source community for your support. Together, we’re making software safer for everyone.
Genie has transformed how Databricks users work with data, with 3x the accuracy of generic agents. We're sharing some of the research behind it and what makes building data agents challenging. Super proud of our research team's impact with this! https://t.co/eLB2ElVo8S
𝐆𝐞𝐧𝐢𝐞 is now the most important way to do data analysis in Databricks. What's unique about it is its ability to extract semantics from your entire Lakehouse, enabling it to answer complex data questions that cripple agents without a deep data understanding. We've now added a Mobile version, added Unstructured data processing, as well as enabled it to operate on all your dashboards and notebooks. Check it out:
https://t.co/bqPvg2lYS7
something i've noticed: AI agents create a weird new kind of burnout. esp for young people.
a lot of ambitious 22 year olds are going to think the answer is simple:
- spin up more agents
- ship more code
- sleep less
- outwork everyone
and for a while, it will feel incredible.
you can keep multiple agents running, feed them tasks, review outputs, fix mistakes, make decisions, and keep the whole loop moving.
the problem is that the work no longer drains you through typing. it drains you through judgment.
More attention.
More context switching.
More verification.
More decisions per hour.
so instead of 8-10 normal productive hours, you might get 4-5 extremely intense hours before your brain is fully cooked. and you feel numb until you sleep properly and reset
some of my friends are already burnt out. they don't say it out loud but i can tell.
the agent can keep working 24/7.
the human still has a hard limit
GPT 5.5 and Codex are now available and manageable on Databricks! Both support Unity AI Gateway so you can manage access and costs, add guardrails, secure access to MCPs centrally, and audit usage. https://t.co/FZ5XRKlCxh
If you feel like Anthropic is going after every enterprise software market and that the big SaaS enterprise platforms like Salesforce, ServiceNow and Workday are toast, you are wrong.
This simplistic thinking fundamentally misunderstands the difference between an AI Agent and the Enterprise Platform. Let me explain:
> An AI agent executes tasks. An enterprise platform defines, orchestrates, and gives the agent context to execute that task.
> An AI agent has access. An enterprise platform governs agent permissions.
> An AI agent can act. An enterprise platform can audit, control, and enforce.
> An AI agent may go rogue. An enterprise platform guarantees compliance deterministically.
> An AI agent is powerful in isolation. An enterprise platform is powerful in coordination across teams and business units.
Furthermore, an enterprise platform can be multi-model, multi-cloud, and multi-integration. It is future proof for the customer in a dynamic market.
CIOs buy Enterprise Platforms and will continue to do so, as long as those platform deeply integrate AI Agents within deterministic, governed, auditable, business processes.
Increasingly i think one of the biggest differentiators between who will be able to "pull off" generalized robotics intelligence is based on who understands hardware. At the margins, little details matted, and robots need to work well.
So something like this -- building a humanoid robot from scratch -- seems really cool and useful
As AI reasoning gets good enough, we think memory will be the next bottleneck for agents. Can your agent improve with more experience?
We call this Memory Scaling, and it's related but different from continual learning. A few examples and challenges:
https://t.co/raIa0U7MPs
My conversation with Sergey Levine (@svlevine).
Sergey is the co-founder of @physical_int -- a company building foundation models that can control any robot to do any task in any environment.
The company's thesis is that generality is more scalable than specialization, meaning that a model trained across many different robots and tasks will ultimately outperform any system built to do one thing well (eg, just wash dishes).
Sergey is a researcher by background, but I think you will appreciate how practical and commercially grounded this conversation is.
We discuss:
- Why changing a diaper will be the last task a robot masters
- The simulation v. real-world data debate
- How multimodal LLMs give robots common sense
- Moravec's Paradox + Robot Olympics
- Why robots can do long-horizon tasks now
- A realistic timeline for robots in our homes
I should note that I am an investor in Physical Intelligence -- I made the investment because I believe it is one of the most important companies tackling the problem of robotics.
Enjoy!
Timestamps:
0:00 Intro
2:39 Defining Physical Intelligence
5:19 The Challenge of Building General Models
6:34 The Stakes and Future of General Purpose Robotics
8:15 Pros and Cons of Humanoid Robots
10:12 Historical Milestones in Robotics Research
15:31 Combining Generative AI and Deep RL
21:24 Moravec's Paradox 25:33 Kitchen Robots
29:30 Simulation vs. Real-World Data
30:48 The Robot Olympics
36:31 The Physiological Reality of Embodiment
38:56 Controversies in the Robotics Community
44:18 What Makes a Great Researcher
48:27 How Businesses Should Prepare for Robotics
54:09 Tracking Progress Through Research Papers
57:02 The Next Step: Mid-Level Reasoning
1:02:00 The Kindest Thing
@JeffDean says it best, the problem in this new agentic era is "tools designed for human speed interaction". That's why we think agents love 𝗟𝗮𝗸𝗲𝗯𝗮𝘀𝗲 𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝘀, it can branch, snapshot, scale up and down in a second, orders of magnitude faster than other databases.
https://t.co/s32LICSEXc
Read about this architectural shift from Database to Lakebase:
https://t.co/hL19TVTIaH