One of the most substantive classes with @ChaseLochmiller at Stanford. We went deep on economics of the datacenter:
- Where is the ~$650B of AI infra capex actually going this year?
- Who's capturing the margin, who's getting squeezed?
- How the bottleneck has moved from GPUs to power, and where it goes next
- The economics of neoclouds
$MU is going to build the largest semiconductor manufacturing facility in U.S. history and the world's most advanced memory fab
You know the rest... they gonna need more semiconductor equipment: bigger orders for $ASML $AMAT and $KLAC
Very insightful post by Gavin below on Nvidia's $20B Groq licensing deal.
AI inference has 2 steps, Prefill & Decode.
Prefill means the model reads your whole prompt and context. Decode means it writes the reply one small chunk of text at a time.
These 2 steps like different hardware. Prefill benefits from large memory capacity, so it can hold long context, speed matters less. Decode benefits from extremely fast memory bandwidth and very low delay, memory size matters less.
---
Now Groq’s LPU is a complete departure from the GPU/TPUs. It doesn’t use HBM (External Memory) at all. Instead, it uses SRAM (Static Random Access Memory), which is built directly into the silicon of the chip.
Groq’s big advantage is how quick it runs single user inference. Thanks to its compute layout and only-local memory, it hits one of the fastest single user tokens per second rates on the market.
SRAM is up to 100x faster than the HBM found in GPUs. Because the data is right there on the chip, there is zero “fetch time.”
The downside of the chip is that it has no external DDR or HBM memory - only onboard SRAM. While fast, the 230 MB capacity per chip has been super low, and implies that a reasonably small open source model like Llama 70B requires 10 racks of processors and over 100 kW of power to run.
Groq’s LPUs don’t need liquid cooling, and that’s a pretty big deal. Most data centers around the world still use air cooling, not liquid. Nvidia’s Blackwell and upcoming chips, on the other hand, will mainly rely on liquid cooling since they’re built for top performance.
---
Nvidia’s 3 Rubin chips line up with that. Rubin is the main GPU with HBM4, very high bandwidth, used for training and high-throughput decode. It is usually a dual-die part with NVLink and large HBM capacity.
Rubin CPX is a sibling tuned for prefill. It is single-die, uses 128GB GDDR7, about 30 PFLOPS NVFP4, cheaper and cooler, and accelerates long-context attention. It trades bandwidth for capacity and cost to make prefill efficient.
Rubin with HBM is the balanced workhorse for training and high throughput inference.
Nvidia’s plan is to run prefill on CPX, then hand off decode to standard Rubin, all orchestrated in the same rack. That is the connection. They are designed to work together.
A Groq-style Rubin with SRAM will be very fast for token-by-token decode, but holds less. Systems can use CPX or normal Rubin to do prefill, then hand off to the SRAM Rubin for the fast typing part.
Result, you get faster answers for interactive apps while keeping costs sensible by using the right chip for each step.
After the deal, Groq will remain an independent company, hold the IP, and service GroqCloud (it’s online neocloud business) with all the middle-eastern deals it has done over the years.
Now, about competition. Nvidia understands that if the HBM, energy, liquid cooling, and CoWoS limits choke the market and lead to a serious compute shortage, both customers and rivals will hunt for workarounds. In that situation, Groq, which doesn’t depend on the same supply chain constraints, becomes an obvious alternative.
Google will never sell TPUs. The moment Google sells TPUs at scale, they transform their architectural advantage into a commodity.
Google's internal teams have first-order claims on TPU capacity because those workloads directly generate revenue and strategic moats. Any TPU sold externally is a TPU not used to defend Google's primary profit engines.
Right now, TPUs are Google's proprietary edge, vertical integration that lets them operate AI infrastructure at costs competitors can't match. DeepMind can burn through compute budgets that would bankrupt OpenAI because Google doesn't pay retail GPU prices, they pay internal TPU marginal cost.
If Google starts selling TPUs externally:
- They have to price competitively vs Nvidia GPUs, which means revealing their cost structure. Suddenly, everyone knows Google's true AI compute costs aren't magic.
- Selling bare metal TPUs means publishing detailed specs, performance benchmarks, and programming interfaces. This is handing competitors a blueprint for "how Google actually does AI at scale." Right now, that's proprietary. The moment it's a product, it becomes studied, reverse-engineered, and eventually replicated.
- Google Cloud already sells TPU access via GCP at premium prices. If they start selling bare TPUs, they're competing with their own higher-margin cloud offering. No sophisticated buyer would pay GCP markup when they could buy TPUs directly and run them cheaper.
GCP TPU pricing is not aggressive compared to GPU alternatives, but it is premium. This isn't incompetence, it's intentionally priced to discourage massive external adoption. Google makes TPUs available enough to avoid antitrust "hoarding infrastructure" accusations and to capture some high-margin cloud revenue, but they don't actually want external customers consuming capacity at scale.
Compare this to AWS, which sells every chip they can manufacture (Graviton, Trainium, Inferentia) because AWS is a commodity infrastructure business. Google's core business is ads and consumer products that depend on AI infrastructure. Selling the infrastructure is like McDonald's selling their supply chain to Burger King, even if it generates revenue, you're strengthening competitors and weakening your primary business.
You can't simultaneously be a commodity chip vendor AND maintain proprietary infrastructure advantage. The moment you sell, you commoditize. The moment you commoditize, your advantage evaporates.
Given that selling TPUs appears strategically unsound, why is there speculation that Google pursue it anyway? I think because cloud divisions at every hyperscaler have perpetual "we need differentiation" anxiety, and custom chips look like differentiation. But differentiation only matters if it protects margins or captures share without destroying your core business. Google selling TPUs would be differentiation that destroys more value than it creates.