7 years in software IB and private equity investing & 5 years (so far) in L/S equity investing in TMT. I like to opine & share some personal wisdom I’ve learned
GPT-5.5 and Opus 4.8 sit ~1 point apart on the Intelligence Index: 60.2 vs 61.4. Their token pricing is almost a match: $5 input on both, $30 vs $25 output. So why does running the full Index cost $3,357 on GPT-5.5 and $4,685 on Opus 4.8 (40% gap)? Not the sticker price but token count.
GPT-5.5 burned ~75M tokens to complete the Index. Its predecessor GPT-5.4 used ~40% more. Opus 4.7 , same generation, comparable score, generated 110M tokens. Same answer, ~1.5x the output. DeepSeek V4 Pro, scoring 52 spent 190M tokens getting there.
That’s the whole game now. Per-token pricing is the rate and tokens-to-completion is the actual invoice. A model can win on price-per-token and lose badly on price-per-task, because the reasoning trace, the restatement, the over-thinking is the multiplier nobody printed on the spec sheet.
This is why the cheapest-per-token model is routinely the most expensive per outcome -> Researchers have a name for it called the overthinking tax. Smaller, cheaper models that ramble can cost more in total than a pricier model that’s terse and converges fast.
The buyer-side implication is the part the market hasn’t priced in:
A) The flagship layer now competes on token efficiency not just capability. “40% fewer tokens for the same score” is a moat and it doesn’t show up in a pricing table.
B) The app layer competes one level higher where dollars per resolved outcome. A closed ticket, merged PR, correct extraction, etc. Token cost is just COGS underneath that.
C) Procurement changes shape. Uber capped employee AI spend after torching its budget in 4 months. Enterprises are learning that cheap model and “cheap workflow” are unrelated numbers.
Price per token was always a proxy which means the real metric was always tokens × price × attempts-to-correct.
$GOOG $MSFT $META
1. Inference cost ≠ inference price. Per-token cost has fallen ~10x+ per year across the frontier. Today’s $100B buys an order of magnitude more capability than 18 months ago. Burning more tokens at collapsing unit cost is natural
2. Heavy internal usage is the leading indicator everyone wants. The single best signal of product-market fit for AI labor is the lab eating its own supply. The companies that win automation will be the ones that automated themselves first.
3. Sam’s “suddenly a huge issue” is more on margin . Cost discipline showing up means the free-token land-grab phase is ending and unit economics are entering the conversation which is where we want to go
Yep. Direct lending clears SOFR+550 vs SOFR+375 syndicated. Add OID and call protection and you’re underwriting 11-12% gross on senior secured. Even at a punitive 3% default / 60% recovery, that’s ~120bps of loss and still ~9.5-10% net.
The low vol everyone mocks is just the absence of forced-seller mechanics where no CLO manager tripping a WARF test and dumping your loan at 88. Same credit, no liquid bid gapping 10 points on a headline
Love it. Accounting is the great roll-up substrate for one reason that has nothing to do with tech: a retirement cliff. Thousands of decades-old firms with sticky recurring revenue and no internal buyer. Annuities sold at single-digit multiples by owners with no exit.
Permanent capital is the unlock b/c trad PE had to strip cost and flip on a horizon which is exactly why every prior services roll-up financialized the relationships and watched them walk out the door.
Perma capital lets sellers keep equity (rainmakers stay) and lets the AI amortize over a decade instead of a 3yr IRR. And the structure and the tech are the same bet
Some issues :
1. A survey of corporate pilots measures the median deployer which useless for a capex thesis. Returns in a diffusing tech are power-law, not normal. Averaging the right tail away produces disappointing by construction.
2. Bain/MIT/McKinsey are measuring a procurement problem where companies bought generic copilots, left the workflow intact, kept humans approving every step, didn’t fix data access, and then expected 20%+ cost takeout to magically hit EBIT. Time saved is not ROI until management actually does one of three things of reducing labor, increasing throughput, or ships more revenue with the same headcount.
AI theater gets cut & workflow-native AI compounds.
Not true.. OpenAI has ~33% gross margin, inference cost ~$8.4B in 2025. GM positive means every incremental customer pays for their own compute and throws off cash above it. A company growing 233% YoY should run compute ahead of demand as under-provisioning caps growth while not protecting margins.
Bubbles and real value coexist is unobjectionable. If we look at all 3 at once: capex is ahead of near-term revenue in aggregate (true), specific levered players may not survive (true), and unit economics at the leader are gross-margin positive and improving while revenue compounds 200%+ (also true).
Model layer is not a moat. It’s the distribution flywheel where they own the runtime (Codex/Sites), the app surface, sell tokens to everyone building on top then absorb the highest-margin workloads yourself. Classic platform-tax + verticalize playbook. The startups at risk are the thin wrappers with no proprietary data or workflow lock-in which are exactly the ones consuming the most tokens. Open source doesn’t even tackle the distribution problem as it just commoditizes the input cost.
Death to dumb AI startups where the easy ride is over
The TAM is more persistent inference such as wake words, planning, device routing, memory, personalization, sensor fusion, home/auto agents.
The architecture is probably hybrid: small local model handles routing, memory, tool calls, permissions, and continuous sensor/device loops. Frontier gets called only when reasoning depth or knowledge freshness actually matters.
Engineers don’t care about costs.. not ont them.
route hard reasoning to Claude/GPT-5 class models, but push autocomplete, tests, doc lookup, boilerplate, lint fixes, simple refactors to cheaper small models.
Then layer in prompt caching, repo-level RAG instead of dumping 200k context, diff-only code review, max-token budgets, async batch jobs, and eval-based routing. Same dev velocity, 50–80% lower inference bill.
~$2.2B over 15yrs ≈ $147M/yr against $900M of 5yr notes at 7.5%. The bond matures a decade before the lease does so noteholders are underwriting refi/residual risk on a build-to-suit asset whose value is ~entirely the CRWV credit. And CRWV is triple-net here responsible for power, utilities, taxes, so the propco’s coverage is basically a pass-through of the tenant’s willingness to keep paying.
~$30B of the $40B ATM is sell-to-cover for employee tax withholding not infrastructure. And the Berkshire $10B at a ~6% discount really is trying to get Buffett’s name anchoring the equity which is worth more than the cash.
They’re raising because the capital’s cheap /re-optimzing cap structure and the endorsement is free.
@bubbleboi If EMIB + 18A-P/PT gives hyperscalers a credible non-TSMC path for even part of the stack by 2028, the option value is way bigger than the market is giving them today 🔥
@jukan05 Neoclouds and tier-2s can’t match it so they inherit the high floor the hyperscalers just set with no allocation priority. The prepay raises rivals’ input costs by design
What a load of bs.
16x leverage is policyholder reserves, money set against future claims, not borrowed money on a trade. Every annuity writer looks 15-20x by this math. Citing it like hedge-fund margin is category error.
Level 3 means unobservable inputs, not unknowable value. A 30-yr private loan held to maturity is supposed to be Level 3 as that’s the point of matching illiquid assets to illiquid liabilities. “No market price” just means it doesn’t trade on a screen.
And the annuitant is an unsecured creditor backed by the insurer’s solvency not the owner of a single GPU. $5.4B of chips in a ~$360B book is not a fuse wired to grandma…
https://t.co/oe5Y3DCch6
Burry calling this fugazi is so damn lazy… almost as he doesn’t have grasp..
Level 3 means unobservable inputs, not unknowable value. Private credit, directly-originated loans, CLO equity are Level 3 because they don’t trade on a screen not because they’re vapor. A 30-year private mortgage held to maturity by a life insurer is supposed to be Level 3. Matching long-duration, illiquid assets against long-duration, illiquid annuity liabilities is the actual job..
The “retirees don’t know their money funds Grok” framing is sensationalism and dumb. An annuity is a general account product. The retiree is an unsecured creditor of the insurer with a contractual payout so they were never promised a specific asset, and the insurer’s solvency, not any single GPU SPV backs the guarantee. $5.4B of GB200s inside a ~$360B book is a rounding error
Insurance “leverage” is policyholder reserves money set aside against future claims not borrowed money stacked on a trade. Every annuity writer looks 15-20x levered by this math because that’s what a balance sheet funded by policy reserves looks like. Citing it next to a hedge fund’s margin leverage is category error. What matters is asset-liability duration match and the regulatory capital ratio