OpenAI is spending $50 billion on compute this year and still can't launch products it already finished building.
The centralized model is hitting a wall. Data centers take years to build, and every megawatt of power capacity is already spoken for by hyperscalers outbidding each other on GPU allocation.
Meanwhile, millions of consumer GPUs sit idle right now. The compute already exists - scattered across desktops worldwide instead of packed into a single warehouse.
We made that bet years ago. Distributed AI inference on hardware people already own. The infrastructure is plugged in and waiting.
$50 billion buys a lot of concrete and cooling systems. It also buys time for everyone else to realize the compute gap won't close from the supply side alone.
Greg Brockman, President of OpenAI, said there is not enough compute in the world to satisfy AI demand, and OpenAI itself cannot launch products it has already built because it cannot find the infrastructure to run them (Save this).
OpenAI is spending $50 billion on compute in 2026 alone and it still is not enough.
That is the setup but here is the trade.
Nebius is one of the most asymmetric infrastructure plays in public markets right now, and most people have never heard of it.
Q1 2026 revenue came in at $399 million, up 684% year over year, with AI cloud revenue specifically growing 841% in a single quarter.
The company entered 2026 with an exit ARR of $1.25 billion and is targeting $7 to $9 billion by year end, a number that would make it one of the fastest revenue ramps in the history of public infrastructure companies.
The contracted backlog sits at $50 billion anchored by a $17.4 billion agreement with Microsoft through 2031 and a $27 billion five-year deal with Meta.
They are decade-scale infrastructure commitments from the two largest enterprise AI spenders on earth, signed before the demand curve has even reached its steepest point.
Nvidia took a direct equity stake in Nebius, one of only two neoclouds it has invested in alongside CoreWeave.
That relationship is not just financial but rather means Nebius gets preferential access to GPU allocation at a moment when every lab and every hyperscaler is competing for the same constrained supply.
Contracted power capacity now exceeds 3.5 gigawatts, with expansion plans targeting 5 to 6 GW by mid-2029.
And power is the other binding constraint in AI infrastructure, you cannot build a data center without it and Nebius has already secured the capacity that competitors are still fighting to acquire.
At full ramp, analysts project revenue in the $15 to $25 billion range by 2029, against a current market cap the contracted backlog alone already dwarfs.
Come join Milk Road Pro and get our full Nebius deep-dive, the exact price levels we are watching, how we are sizing the position against the backlog and power capacity timeline, and our full AI thesis.
link below!
We gave FLUX.1 Schnell, FLUX.2 Klein, and Z-Image-Turbo an identical prompt: a fantasy castle diorama on a floating island, lightning in the background, waterfalls spilling into the void.
Each model built something different from the same words. All three ran for free in GamerHash AI.
Which castle goes on your shelf?
This track bloomed from a text prompt.
"Electric Bloom" - full electronic track, generated for free in Boppy. The compute behind it came from @deAPI_, powered by GPUs of people running the Earn Module in GamerHash AI.
You describe a vibe. Someone's graphics card turns it into music. That's the loop.
🎧⬇️
We generated this entire scene from a text prompt. Spaceship bridge, holographic displays, spoken dialogue - LTX 2.3 handled all of it.
A year ago you'd need a 3D artist, a voice actor, and a render farm. Today you need a graphics card and an idea.
This ran on consumer hardware through GamerHash AI.
$2.63/GPU-hour today. $5.10 at renewal. New capacity? 12-15 months out.
We've been saying this for a year. Cloud compute is a bottleneck that only gets tighter.
That's why we built @deAPI_ - an inference layer running on distributed consumer GPUs. While enterprises wait 15 months for cloud allocation or pay twice as much per GPU-hour, GamerHash users are already processing AI workloads on hardware that would otherwise collect dust.
The cloud squeeze isn't coming. It's here. And we're already on the other side of it.
Baseten CEO @tuhinone tells Altimeter's @apoorv03 that one of Baseten's cloud providers has already indicated their B200 prices ($/GPU hour) are set to double when existing contracts expire and are up for renewal later this year.
"If you go out right now saying you want a thousand GPUs, truly.. people are talking about Q2 of next year. So 12 months out, maybe 15 months out.
We have a cluster.. in one of these clouds.. of B200s.. Our unit price right now is $2.63 an hour.. that's up for renewal in October. They came to us already in May and said $5.10 is the new price.. So double."
Your golden retriever just learned how to breathe.
Well, not really. But image-to-video in GamerHash AI got close enough to freak out the cat.
One photo. One click. Your dog yawns, stretches, looks at you like you owe him a treat. All rendered locally on your GPU, all free.
The uncanny valley just got a new resident, and he's a very good boy.
We told LTX-2.3 to bring a painted knight figurine to life on a gaming desk. It lifted its visor and walked off like it had somewhere to be.
Single text prompt, rendered in GamerHash AI on our DePIN network. Your GPU gave a two-inch figurine a personality.
Last weekend, the US government killed Anthropic's Fable 5 and Mythos 5 with a single order.
Every app built on those models went dark overnight.
Our DePIN runs open-source models on distributed consumer GPUs. Try sending a shutdown order to ten thousand machines you don't own.
The Fable 5 ban made one thing clear: the intelligence layer now has a fast policy gate that hardware never had.
Hardware bottlenecks (HBM, power, advanced packaging) take years to shit but today it moved in hours.
One export directive on a closed llm = global cutoff
- frontier capability just became contingent on jurisdiction and politics (in a way it wasn’t 48 h earlier)
- clean segmentation at scale is messy.
this exposes a few layers:
1. hosted frontier model itself is no longer a neutral, always-on input. It sits behind a geopolitical choke that can be pulled for “safety” reasons with broad mkt collateral
2. the inference layer underneath becomes strategic. Who serves the model, how it’s routed, quantized, finetuned, guiardrailed, post-trained, and where the data boundary sits now carries real sovereignty weight.
3.Orchestration and redundancy stop being nice to have architecture and start looking like basic operational hygiene once any single frontier llm can be turned down faster than you can figure out alternatives
4. Europe’s demand-side sovereignty moves (Chips Act 2.0 + CADA) were already tilting this way. The ban just gave them a crisp, recent case study of the exact risk they’ve been pricing in. It most likely reduces timelines on building parallel capacity and preferring alternatives in critical sectors
On the inference side this opens real space
Specialized providers that can run open weights, customized finetuned and post-trained models at scale with strong sovereignty guarantees just got more relevant.
-> Not because frontier models disappeared, but because the economics and risk profile of depending on them exclusively shifted now
You can keep frontier hosted models for the narrow slice of work where they still deliver decisive quality on long horizon or high-stakes reasoning.
But for volume, regulated workloads, domain-specific agents, or anything where you need predictable updates, data residency, or protection from foreign policy moves, running customized open models on controllable infrastructure becomes the cleaner default.
This is where players like @nebiustf sit in an interesting spot.
Access to sovereign EU compute + strong inference stack + ability to host and serve fine-tuned or post-trained open models gives a credible path to reduce single jurisdiction dependency without giving up performance on the workloads that matter most.
Some deeper angles worth tracking
- Token economics get more layered.
Frontier APIs stay expensive per token for a reason.
Open + fine-tuned models on sovereign or managed inference can be dramatically cheaper at volume once you control the serving stack and quantization. The gap matters more when you’re already hedging policy risk.
- Agent reliability becomes an orchestration problem, not just a model problem. If the frontier tap is sometimes restricted or degraded, you need clean fallback paths and routing logic that preserve output quality where it counts. That creates demand for more sophisticated inference engineering, not just bigger context windows.
- US labs face a subtle structural pressure. The more visible the revocation risk becomes, the stronger the incentive for non-US actors to invest in parallel inference capacity and customized models.
- and over time this can slow winner-take-most dynamics at the frontier even if raw capability btween llms gaps remain.
Power and grid constraints don’t disappear.
What of they just get pulled in slightly more directions as people build hedging capacity?
Parallel sovereign or hybrid inference clusters still compete for the same scarce electrons and networking obv
The real constraint that just got sharper is this designing systems that assume any single centralized frontier hosted model can become less reliable or more expensive to access on policy grounds, not just tech ones.
The ban didn’t invent that assumption but defo made it ignoring it look like incomplete engineering.
We asked FLUX.1 schnell, Z-Image-Turbo, and FLUX.2 Klein 4B to trap a prehistoric insect in amber.
The prompt was identical. The results couldn't disagree more.
All generated locally in GamerHash AI - which one nailed it? 👇
A full angelic choir, composed from a single text prompt on https://t.co/qpwNScXJGP powered by @deAPI_.
The entire chorus was rendered on a consumer GPU somewhere in our network. The owner probably had no idea their graphics card was conducting a choir while they grabbed coffee.
Developers build AI music tools on our compute, and the result sounds like it belongs in a cathedral.
Open-source model weights don't have a kill switch.
Your API calls hit thousands of GPUs across the globe. No single government order can revoke that overnight.
Something to consider when picking your AI infrastructure.
Season 1 rewards just dropped. Every participant got their cut of the rewards pool based on committed GHXP.
Season 2 is live now. Here's the call you need to make: commit heavy for a bigger slice, or hold your GHXP for a future season when the pool might favor you.
Either way - the app needs to be running. Points only stack while it's on.
What does a graphics card dream about between inference tasks?
We asked LTX-2.3 and got this - light particles swirling out of a GPU, turning into music and faces before fading back into the dark. One prompt, GamerHash AI, your hardware.
Probably more romantic than what actually happens inside the VRAM. But the AI chose beauty, and we're not arguing.
The compute shortage is so bad, SpaceX is literally launching GPUs into space.
Orbital data centers. Rocket-delivered inference. Millions of dollars per rack just to get hardware closer to... satellites?
Here's a wilder idea: what if we used the GPUs already sitting in 800 million PCs around the world? The ones burning electricity right now doing absolutely nothing?
Sometimes the answer isn't up. It's already on your desk.