Why did xAI hand over a 220,000-GPU cluster to Anthropic?
The technical backdrop to xAI's decision to hand Colossus 1 over to Anthropic in its entirety is more interesting than it appears. xAI deployed more than 220,000 NVIDIA GPUs at its Colossus 1 data center in Memphis. Of these, roughly 150,000 are estimated to be H100s, 50,000 H200s, and 20,000 GB200s. In other words, three different generations of silicon are mixed together inside a single cluster β a "heterogeneous architecture."
For distributed training, however, this configuration is close to a disaster, according to engineers familiar with the setup. In distributed training, 100,000 GPUs must finish a single step simultaneously before the cluster can advance to the next one. Even if the GB200s finish their computation first, the remaining 99,999 chips have to wait for the slower H100s β or for any GPU that has hit a stack-related snag β to catch up. This is known as the straggler effect. The 11% GPU utilization rate (MFU: the share of theoretical FLOPs actually realized) at xAI recently reported by The Information can be read as the numerical fallout of this problem. It stands in stark contrast to the 40%-plus MFU figures achieved by Meta and Google.
The problem runs deeper still. As discussed earlier, NVIDIA's NCCL has traditionally been optimized for a ring topology. It works beautifully at the 1,000β10,000 GPU scale, but once you push into the 100,000-unit range, the latency of data traversing the ring once around becomes punishingly long. GPUs need to churn through computations rapidly to keep MFU high, but while they sit waiting endlessly for data to arrive over the network fabric, more than half of the silicon falls into idle. Google sidestepped this bottleneck with its own custom topology (Google's OCS: Apollo/Palomar), but xAI, by my read, has not yet reached that stage.
Layer Blackwell's (GB200) "power smoothing" issue on top, and the picture comes into focus. According to Zeeshan Patel, formerly in charge of multimodal pre-training at xAI, Blackwell GPUs draw power so aggressively that the chip itself includes a hardware feature for smoothing power delivery. xAI's existing software stack, however, was optimized for Hopper and does not understand the characteristics of the new hardware; when it imposes irregular loads on the chip, the silicon physically destructs β literally melts. That means the modeling stack must be rewritten from scratch, which in turn means scaling is far harder than most of us imagine.
Pulling all of this together points to a single conclusion. xAI judged that training frontier models on Colossus 1 simply was not efficient enough to be worthwhile. It therefore moved its own training workloads wholesale onto Colossus 2, built as a 100% Blackwell homogeneous cluster. Colossus 1, on the other hand β whose mixed architecture is far less crippling for inference, which parallelizes more forgivingly β was leased in its entirety to an Anthropic that desperately needed inference capacity.
Many observers point to what looks like a contradiction: Elon Musk poured enormous capital into building Colossus, only to hand the core asset over to a direct competitor in Anthropic. Others read it as xAI capitulating because it is a "middling frontier lab." But these are surface-level reads.
Look at the numbers and a different picture emerges. xAI today holds roughly 550,000+ GPUs in total (on an H100-equivalent performance basis), and Colossus 1 (220,000 units) accounts for only about 40% of the total available capacity. Colossus 2 β built entirely on Blackwell β is already operational and continuing to expand. Elon kept the all-Blackwell homogeneous cluster (Colossus 2) for himself and leased out the older, mixed-generation Colossus 1. In other words, he handed the pain of rewriting the stack β the MFU-11% debacle β to Anthropic, while keeping his own focus on training the next generation of models.
The real point, then, is this. Elon's objective appears to be positioning ahead of the SpaceXAI IPO at a $1.75 trillion valuation, currently floated for as early as June. The narrative SpaceXAI now needs is that xAI β long the "sore finger" β is not merely a research lab burning cash, but a business with a "neo-cloud" model in the mold of AWS, capable of leasing surplus assets at high yields.
From a cost-of-capital perspective, an "AGI cash incinerator" is far less attractive to investors than a "data-center landlord generating cash."
As noted above, the most important detail of the Colossus 1 lease is that it is for inference, not training. Unlike training, inference requires far less tightly synchronized inter-GPU communication. Even when the chips are heterogeneous, the workload parcels out cleanly across them in parallel. The straggler effect β the chief weakness of a mixed cluster β is essentially neutralized for inference workloads.
Furthermore, with Anthropic occupying all 220,000 GPUs as a single tenant, the network-switch jitter (unanticipated latency) that arises under multi-tenancy disappears. The two sides' technical weaknesses end up complementing each other almost exactly.
One insight follows. As a training cluster mixing H100/H200/GB200, Colossus 1 was an asset that could only deliver an MFU of 11%. The moment it was handed over to a single inference customer, however, that asset transformed into a cash-flow asset rented out at roughly $2.60 per GPU-hour (a weighted average of the lease rates across GPU types). For xAI, what was a "cluster from hell" for training has become a "golden goose" minting $5β6 billion in annual revenue when redeployed for inference. Elon's genius, I would argue, lies not in the model but in this asset-rotation structure.
The weight of that $6 billion becomes clearer when set against xAI's income statement. Annualizing xAI's 1Q26 net loss yields roughly $6 billion in losses per year. The $5β6 billion in annual revenue generated by leasing Colossus 1 to Anthropic, in other words, almost perfectly hedges xAI's loss figure. This single deal effectively pulls xAI to break-even.
Heading into the SpaceXAI IPO, this functions as a core line of financial defense. From a cost-of-capital standpoint, if the image shifts from "research lab burning cash" to "infrastructure tollgate stably printing $6 billion a year," the entire tone of the offering can change.
(May 8, 2026, Mirae Asset Securities)
Tonight, I had dinner with a friend whoβs also a tech CEO. Given my interest in stocks, we ended up in a 4H, in-depth conversation/debate about which companies and stocks stand to benefit the most from the changes $NVDA and Jensen announced in March. Figured some of you fellow nerds might be interested. I tried to keep the technical jargon to a min. And, I'll make this into a thread of what he/I came up with revenue estimates/profit thoughts. **NFA this is only two nerds using AI and a couple of extremely strong drinks as to how each company would be affected and who stands to benefit the most.
The Rubin + Groq LPX isn't just faster inference, itβs about where AI dollars go next. Disaggregated serving in the range of 35x shifts value: HBM, advanced packaging, Ethernet and optics, SSD/NAND and capacity storage, cooling and power systems, substrates, server integration, and test/burn-in. Some of the biggest winners will be obvious mega caps, but some of the highest % upside will come from smaller, more specialized companies. π
@soumyasen Agree that margins and unit economics are going to change across marketing and commerce funnel. No matter what, scale will lead to best-in-class margins (even if they are below todayβs levels). And, re: pair trade, isnβt there a bigger CRWV inside Google already (ie. GCP)?
@HarryStebbings Would love to revisit this when you meet this company CEO in six months. The key question then would be how they think of variable costs, monthly subscriptions vs tokens.
$NBIS I warned 3 days ago about the ticking time bomb for shorts that we are just witnessing & I believe it will not slow down hereΒ΄s why...
A short squeeze requires three ingredients: high short interest, a catalyst that forces covering, and limited available shares. NBIS has all three. β
The Short Position Is Massive and Growing:
44 million shares short at $136.33 means shorts are sitting on roughly $6 billion in notional exposure. Every $1 the stock moves against them represents ~$44 million in aggregate mark-to-market losses. The move from $83 (March low) to $136 has already inflicted approximately $2.3 billion in pain on the short base.
Some have covered (which helped fuel the rally), but many clearly haven't ... the short interest is still near its highest levels.
The Catalyst Stack Is Relentless
Here's what bears have had to absorb in just the last 30 days:
March 4: Missouri 1.2GW AI factory β stock +12%
March 11: NVIDIA $2B investment β stock +16%
March 16: Meta deal expanded to $27B β stock +14%
March 25: $4.34B convertible notes offering (upsized from $3.75B due to demand)
April 9: Stock closes at $136.33, +9% on the day, approaching ATH
Each of these events forced a wave of short covering, but new shorts seem to be stepping in at every level essentially trying to fade a freight train. The problem is that the fundamental catalyst calendar isn't slowing down. Q1 earnings coud be end April. GTC announcements continue. The AI21 Labs acquisition talks just broke yesterday. Every positive headline is a forced-covering event.
The Float Is Structurally Constrained
This is the part most people miss. Nebius has ~274M shares outstanding, but the tradeable float is much smaller than it appears:
Volozh holds 13% in locked Class B shares β ~36M shares off the table
NVIDIA's 8.1% carries a 6-month lockup β ~22M shares locked
Accel's 4.4% β venture holders rarely sell into momentum β ~12M shares sticky
Management & engineers hold 7% β ~19M shares not trading
BlackRock, Lazard, Alger, institutions = long-term holders β much of the 14.6% is passive/index
When you strip out locked, restricted, and long-term institutional shares, the effective free float is probably closer to 120-140M shares β not 274M. That means the 44M shares short represent potentially 30-35% of the actual tradeable float
The setup is excellent for us Bulls to reach ATH just in time for Q1 ER. Lets continue this momentumπ’
Instead of watching an hour of Netflix, watch this 2-hour Stanford lecture on AI careers. It will teach you more about winning in the AI race than all the AI content youβve scrolled past this year.
@narmacnetworth Have you purchased ads on Meta or talked to a bunch of people that have? Any sense of whether things are getting better or worse? Just wondering.
@apoorv03 Yes, good post. When we start to see useful life of GPUs to be comparable to CPUs, hopefully we see value accrual shift gradually towards apps and software businesses.
@EthanChoi7@OpenAI@AnthropicAI Rev share and Cogs are different as far as GAAP accounting is concerned. So, both are correct. Please look elsewhere, Kevin.
I accidentally discovered how to compress a month of research into 3 hours.
A founder at a YC company showed me his Claude setup. I thought he was just fast. Then I watched him build an entire go-to-market strategy for a market he'd never worked in before.
Here's exactly what he did:
First: he didn't ask Claude to "research the market."
He fed it 8 competitor landing pages, 3 earnings call transcripts, 12 customer reviews, and a Reddit thread of complaints.
Then he asked one question:
"What does every successful player in this market understand that their customers never say out loud?"
Not "summarize these." Not "analyze the competition."
The unspoken insight. The thing that takes founders 2 years of customer calls to figure out.
But the next part is what broke my brain.
He followed up with:
"Now show me the 3 assumptions this entire market is built on, and what would have to be true for each one to be wrong."
In 15 minutes he had the attack surface of an entire industry.
The blind spots. The fragile consensus. The opening nobody was talking about.
Most founders spend 6 months doing customer discovery just to find one of those.
Then he did something I've never seen before.
He asked:
"Write 5 questions a world-class investor would ask to destroy this business idea, then answer each one using only the evidence in these documents."
He spent the next 2 hours stress-testing every assumption. Every weak answer triggered a follow-up:
"What's the strongest version of this argument and where does it still break?"
By hour 3, he had a strategy deck that felt like it came from someone who'd spent a decade in the space.
The tool didn't change. The questions did.
Most people treat Claude like a faster Google.
These founders are using it like a thinking partner who has read everything and has no ego about being wrong.
The difference between 3 hours and 3 months isn't the amount of information.
It's knowing which questions actually matter.
One of the biggest debates in the compute buildout: are these underlying businesses unprofitable and that there is no way to sustainably finance the investment cycle that we are undertaking.
And I get it β we are spending a lot of money β likely $700B of capex in 2026 amongst the hyperscalers alone. This year, the hyperscalers will build about 20GW of incremental IT capacity. How will we pay for this?
Built a framework to pressure test this β how the major players across multiple sectors monetize a GW of IT capacity.
Before we get into things, want to quickly explain my methodology & frameworks - all this analysis is meant to be a useful framework to get people thinking. These are just my views β do your own work! Validate / reject my premises! We are all our own agents in this world.
That said, unless otherwise noted, all metrics are based on publicly available information from latest calendar year (2025). The adjustments: CoreWeave operating margin of 15% is based on a discounted view on their publicly disclosed 20-30% LT margins / Oracle's 20% target LT margins on GPU cloud business. There is a lot of debate / discourse in the market on what the eventual margins there could be. For illustrative purposes on this chart, I've left them at 15% which largely assumes a shorter depreciation cycle than they have assumed, plus no incremental margins from software. This is akin to the early days of AWS - which in 2015 disclosed they were a 17% operating margin business, while creeping that balance up to 40% over 10 years (spoiler, did so with software!).
OpenAI and Anthropic figures are based on publicly rumored figures of ARR and GW deployed and allocating a portion to "inference" vs "training" compute (I am assuming 60/40 for OAI, more training for Anthro). For instance - OpenAI has disclosed that their ARR at end of 2025 was roughly $20B which coincides with a power footprint of 1.9GW - a ratio of $10.5B ARR / GW. Part of that footprint is not revenue generating and is just training. Someone pushing back could say that is a feature, not a bug of a frontier AI lab β I don't agree, because it all depends on the slope of inference growth. Anthropic has had remarkably amazing utilization of their limited resources over the years. From my outside in work, contribution margins are awesome. Finally, note that for the chart, I use estimated GM% instead of OP%, this is for a visual framework and should leave an upper bound on the profitability of these businesses which are fundamentally "different software businesses than past ones".
I included Snowflake & Salesforce Rev / GW just to add some context. After all, these are the 2000s and 2010s era companies that are powered by compute. I derived their figures by dividing AWS Rev / GW by product gross margins. For the chart, Snowflake OI uses Non-GAAP OP% from CY25 as they are negative on a GAAP basis.
-----------------------------------
Ok with that out of the way, what is the point of the analysis I present?
A couple of observations:
1/ A lot of this analysis is meaningless! Why? Because for Snowflake or Salesforce, this Revenue / GW is an output, not an input. They are simply not in the business of selling repackaged power β they are selling VALUE / utility. In the case of Snowflake, an infrastructure SW company, they are running a feature rich scaled cloud data warehouse at scale. This took a decade of R&D, refinement, continuous development β and is selling you a product that is reasonably hard to duplicate. But they cannot grow their business by just adding power. GW consumption is the output, not the input. The same holds true for Salesforce, or any other software company. Rather, these products are somewhat difficult to sell, due to the large contract values, duration of engagement, etc. Given their raw COGs are relatively low, a majority of their gross margin is invested into S&M to sell the product.
2/ Google and Meta are THE most profitable businesses from a pure monetization / GW. In fact, before the current datacenter investment cycle starting in 2022, these businesses' Revenue / GW were significantly higher. Critics say that the new AI business models are less profitable than their core ads business. And they are completely correct. In fact, Jensen always says this in his speeches β the truth is that in the world of retrieval based software, ads were the most profitable businesses known to mankind. 90% contribution margin, with hardly any need for any S&M, with a baked in 20%+ growth algorithm per year by increasing the efficacy of the ads. But the truth is at some point, this algo slows down - scale slows down. In the same way software businesses could not grow their business by building power, these businesses could not either, there was a natural rate of adoption on these businesses, tweaked over the year with ad load and engagement.
3/ Infrastructure providers largely monetize at ~8-12B / GW and are the closest to the underlying hardware. I have a whole post in my drafts on this (still working!)... The thing I want to call out on the hyperscalers / neoclouds is that the core rental business of hardware usually starts out at ~10-15% operating margin. You can trace this back to the early days of AWS (which I may add, also was criticized as hugely money losing before they showed the world how profitable it was). Everyone thinks of these businesses as 40% EBIT businesses, which they largely are, but that was built over the years by selling software attached to their hardware. The core EBIT margins of just the hardware without adding value services is usually around 10-20%. Core cloud ARR / GW is closer to $12B / GW - you can derive this from AWS disclosures on power. The new accelerated compute infrastructure is around $10B of ARR / GW which is consistent from OCI, CRWV, and Nvidia. The way they all move to higher OP margin is attaching software to it at significantly higher blended gross margins -- the same way the hyperscalers built this during the 2010s.
4/ The model providers. Most controversial / interesting in this post. But perhaps the most applicable here. I am reminded most of the mid 2010s of Uber / Airbnb / Netflix and people / media claiming that these businesses would never make money. But it's all about the unit economics. If you can make 50-70% gross margins, then you can choose to allocate those GM dollars in a few ways. You gain significant operating leverage at scale. And my guess is gross margins likely move higher (another discussion for another time). But of note, VS the past generation of companies, the research compute budget is the significant outlier. This will likely be further concentrated at a certain time - continue to decrease as a % of the company budget, and more inference innovation techniques will be pushed - most of the benefits to consumers, while incremental ~3-5% GM gains will be kept per year...
One of the great realizations in this exercise is that there are many ways of balancing a business to make money. In the case of software, they are hugely efficient / profitable from a "GW" perspective - and as a result, invest all their earnings into S&M to sell their product, which leads to a OP margin that is relatively low. For the hyperscalers, their gross margins are notably lower than their SaaS counterparts, but because their business is so large and have a high degree of trust with their customers, they are able to attach a considerable amount of their first party software while spending considerably less than their SaaS peers to sell that incremental $, yielding significantly higher operating margins. The internet providers are both hugely profitable, and need to invest little in their business, so really grew bloated over the years, investing in frivolous things and innovation grinding to a halt... until AI came along. Now they have a great target to invest in, with likely ways to enhance their core as well.
In the case of AI β in the past few months, we have just crossed the uncanny valley of "model usefulness". They have largely gone from moderately useful chatbots / research tools, to very useful autonomous & agentic. Therefore, the name of the game will quickly shift to inference throughput & latency optimizations. As long as we are riding this S-curve, more compute = more revenues = more operating leverage for the model providers. And we are just starting...
On this latest Nvidia earnings call, Jensen was asked how the hyperscalers will pay for their investments. He replied:
"I am confident in their cash flow growing... in this new world of AI, compute is revenues... I am certain at this point that we are at the inflection point, we've reached the inflection point and we're generating profitable tokens that are productive for customers and profitable for the cloud service providers."
For me, this switch flipped in the middle of 2025 - and really took off in late last year. Opus 4.5 and GPT 5 were tremendously valuable models, that were incredibly useful. We're seeing it now from the testimonies of the likes of Karpathy, etc. But anyone paying close attention to this knows / feels like everything has changed. Inference & usage is in take off mode & these profitable tokens are at the core of it all.
These views are my own β not a view of Altimeter. Do your own research & look forward to discussing!