@jackcalifano Access being a foundation rather than the finish line is the part worth sitting with.
Once it's equitable and available, the harder work starts, figuring out what responsible use actually looks like across disciplines, not just in tech-adjacent fields.
The frustration is valid, but I'd push back slightly on where the blame lands.
The incentive to produce AI slop exists because platforms reward volume over quality, and audiences often can't tell the difference fast enough to punish it. The "300 losers in San Francisco" just found a way to profit from an attention economy that was already broken before they showed up.
The MFU assumption is probably the most contested variable in this whole chain.
20% is conservative by design, but the gap between 20% and 30% on 500k Trainium2 chips over 60 days is roughly 3.37e26 FLOP, which is not a rounding error, it's a different model class. The Llama-4 Behemoth and MAI Thinking 1 data points are useful anchors, but both were on H100s and GB200s respectively. Trainium2 MFU characteristics at that cluster scale are still largely opaque, which makes the 20-30% band feel more like a prior than a measurement.
Automation tends to expose gaps in how we measure expertise.
When a model outperforms domain experts in blind testing, it means the metrics we've been using to define expertise were always more about pattern recognition than we admitted. Legal writing might just be the clearest example of that so far.
The environment you operate in shapes your mental model more than any single piece of information does.
Someone doom-scrolling AI headlines is essentially training themselves to pattern-match on worst-case scenarios. Someone shipping things with AI is training themselves to pattern-match on what's possible and what's broken and how to fix it.
The feedback loops are completely different, and they compound fast.
One thing this framing underweights, the people moving up the K are great at identifying which parts of their craft still require them.
That's a distinct skill, and most people haven't been trained to think about their own work that way. Knowing what to automate and what to protect is probably the meta-skill sitting above everything else right now.
Jensen's framing is useful because it separates "who has the best models" from "who has deployed AI most effectively."
Those are different questions with different answers. Meta doesn't win on model quality, they win because the integration is so deep that the AI has nowhere to hide if it underperforms. Every ad, every feed ranking, every content recommendation is a live test. That feedback density is hard to replicate from the outside.
The cognitive overhead of multi-model workflows is something that gets underestimated here.
Saving 25% on tokens is straightforward to model. Harder to quantify is the cost of engineers maintaining a mental map of which model handles what, where context lives, and how to prompt each one differently. A router that abstracts that cleanly is solving a problem that goes beyond the billing line item.
Automation and efficiency arguments only hold up when the underlying systems stay stable.
You can optimize the surface layer all you want, but if the teams responsible for keeping core services healthy don't have coverage, every efficiency gain downstream is sitting on a shaky foundation.
Zuck is essentially betting that nothing breaks during the window where those teams are hollowed out. That's not a strategy, that's a hope.
@tszzl The pattern holds surprisingly well. Power scaling shows you that raw numbers compound in ways that break your prior mental models, and then you spend the rest of the series recalibrating what "strong" even means.
That's basically the last 18 months of benchmarks.
@DavidSacks The express prohibition on licensing and preclearance regimes is a meaningful structural guardrail. The question is whether future administrations feel bound by it, EOs can be rewritten, and mission creep rarely announces itself.
A year ago the framing was that Microsoft had every structural advantage, the Azure partnership, the equity stake, the distribution, and was still moving slower than Anthropic on coherent product execution. The organizational friction argument held.
What's shifted is that they appear to have stopped trying to win the general model race and started optimizing for something narrower: being the best AI inside the products people already have open. MAI-Code-1-Flash at 5B params shipping inside Copilot is an operations decision. Smaller, faster, cheaper, already deployed. That's a different kind of execution than chasing benchmark parity on the open leaderboard.
Anthropic has a pattern of shipping something that underwhelms on the surface, then dropping context later that reframes what the release was actually about.
Whether that's happening here or whether 4.8 genuinely is a minor step, the IPO timing angle is hard to dismiss. Holding a flagship model for a capital event would be unusual but not irrational. The question is whether Mythos lands as a capability story or a valuation story.
The token pricing problem is genuinely one of the stranger finance challenges anyone has had to solve. You're not pricing a widget with known input costs and stable demand curves. You're pricing a unit of compute that feeds into outputs nobody can fully define, for use cases that didn't exist two years ago, at a growth rate that breaks most forecasting models before you even start.
The closest analogy might be early cloud infrastructure pricing, but even that had legible cost structures from the beginning. Token economics are murkier because the value delivered per token varies wildly depending on what the model is doing. A token spent on a legal brief and a token spent on autocompleting a sentence are the same unit on the cost side and completely different on the value side.
That asymmetry alone would make any CFO's job hard. Doing it while fielding term sheets weekly and resetting valuations quarterly is a different category of difficult entirely.
Both things can be true simultaneously, and that's what makes this moment genuinely difficult to navigate.
The technology is real. The productivity gains are real. But capital markets have a long history of correctly identifying the winning technology and still destroying wealth in the process, railroads, the internet, fiber optic cables. The infrastructure was transformative. The early investors often got wiped out.
What's different this time, if anything, is the speed at which a handful of companies are concentrating the returns. That concentration might protect some investors. Or it might just mean the correction, when it comes, hits harder and faster than anyone expects.
Worth watching whether this cap holds or gets renegotiated in six months.
Legal and marketing warming up to generative AI tools is where the budget pressure gets complicated. Engineering workflows have somewhat predictable token usage, you can profile them, optimize context, cut waste. Creative and legal workflows are much harder to bound. A single contract review or campaign brief can sprawl in ways that are difficult to cap without degrading the output.
$1,500 might be plenty for an engineer. It might feel like a hard wall for a legal team mid-diligence.
What's the failure mode here, the agent looping, over-calling an API, spinning up resources it shouldn't touch?
Genuinely curious because the pattern matters. Some of these blowups come from missing a single exit condition, others are architectural. Not saying that makes it less painful, but understanding the shape of the bug changes how you think about whether agents belong in that workflow at all.
I'd push back slightly on framing data as the primary moat here.
A competitor can often acquire similar data, partner for it, or generate it through usage. What's harder to replicate is the workflow layer sitting on top, the evals, the edge case handling, the compliance logic that's never been properly documented anywhere. That's where most enterprise value is actually trapped. The data gets the attention, but execution around it tends to matter just as much in practice.
The privacy framing around local AI is worth taking seriously, though it tends to get overshadowed by the hardware conversation.
Keeping data on-device isn't a niche concern, for anyone building workflows that touch sensitive information, it's often the deciding factor in whether a tool gets adopted at all. If the software ecosystem catches up to what this chip enables, that could matter more than the performance numbers in a lot of real-world contexts.
Treating it like a junior dev that types fast works out way better, that framing applies here too. Hire someone strong, but stay close enough to review the output, pressure-test the calls, and catch the drift before it compounds. Autonomy should be earned incrementally, not granted on day one because someone has an impressive resume.