Mani Swaminathan

Verified account

@ManiSw82

Orbital AI compute is an operations problem, not a hardware one. I write the arithmetic nobody downstream of the press release does. ex-Accenture, ex-NTT. SF.

San Francisco, CA

Joined October 2012

2.6K Following

294 Followers

212 Posts

Pinned Tweet

Mani Swaminathan

16 days ago

People keep citing TBIRD's 200 Gbps like it settles the orbital compute question. It doesn't, and you can see why if you actually read past the abstract. A pass over a ground station runs about 5 minutes. The best single pass they got was 4.8 TB. A Llama-3-70B checkpoint is around 150 GB, so a pass moves maybe 32 of them. Fine, sounds like plenty. Except a frontier run doesn't write 32 checkpoints and call it a day. It's writing constantly, syncing gradients across thousands of nodes, throwing off logs and telemetry the whole time. Internal cluster traffic is terabytes per second. Per second, not per pass. So you point your 200 Gbps at that and it's not a pipe, it's a straw. And the straw is only open a few minutes an orbit, one ground station at a time, assuming the weather cooperates. The link speed was never the thing to worry about. The visibility window is.

0

0

0

0

30

Mani Swaminathan

10 days ago

The energy drain doesn't disappear, it moves, and to a worse spot. The compute pulls the same watts and dumps the same heat either way. On Earth a substation and a cooling loop handle it. In orbit you're collecting that power across hundreds of square kilometers of panel and rejecting it at a couple hundred watts per square meter of radiator. That's not relief. It's the same load with a launch bill on top.

1

0

0

0

16

Mani Swaminathan

10 days ago

You don't escape the power and permitting fight by going up, you trade it for a worse one. 100GW at the solar constant works out to over 200 square kilometers of array, and you still have to reject all 100GW as heat at a couple hundred watts per square meter of radiator. On the ground a gigawatt is a zoning hearing. In orbit it's square kilometers of hardware you can't iterate on.

10 days ago

SpaceX is positioning orbital data centers as the cornerstone of its IPO strategy, aiming to launch 100 gigawatts of compute capacity annually to bypass Earth’s power and permitting constraints. By merging xAI under SpaceX, the company is betting that the Starship launch system and the Terafab initiative can drive orbital costs down from current levels—which are roughly four times higher than terrestrial alternatives—to reach economic parity by the early 2030s. The move addresses a looming wall in global power availability, but it faces massive engineering hurdles in radiation shielding and thermal management, as space lacks the convective cooling used on Earth. Success depends on reducing launch costs to approximately $250 per kilogram and developing autonomous reliability for GPUs that cannot be physically serviced. If Musk’s aggressive timeline holds, space could transition from a niche experiment to the only viable path for scaling AI compute as terrestrial grids reach their absolute capacity. Get the app: https://t.co/CAiPFDFVsT Read more on Oesnada: https://t.co/LnVUY5j6xM

0

0

0

0

43

0

0

0

0

15

Mani Swaminathan

10 days ago

The heat is the actual wall, more than downlink. A single rack pulls 100kW now and basically all of it comes back out as waste heat. On the ground that goes into water and you forget about it. In orbit your only sink is a radiator, and the textbook number is 100 to 350 watts per square meter. So one rack is a few hundred square meters of panel, and that doesn't shrink as the chips get better. Inference-only is right but the reason is thermal, not latency.

0

0

0

0

11

Who to follow

Mani Swaminathan

12 days ago

Everyone pitching compute in space is solving the wrong problem first. It isn't the chip. The first orbital compute that pays for itself isn't frontier training at all, it's inference on data that's already up there. Imagery, signals, that kind of thing. You process it in orbit and send down the answer, not the firehose. That walks around the downlink wall instead of running at it head first. The real work is the unglamorous plumbing. Enough ground stations that you're never more than a few minutes from a dish. Checkpointing you can trust when a bit flips and nothing flags it. A scheduler that knows which arcs of the orbit are clean and which to just sit out. Whoever builds that quietly, on a real workload, walks past ten startups with a glowing render of a rack in space. The opportunity's real. It just shows up in work clothes.

0

0

0

0

8

Mani Swaminathan

14 days ago

Right, and the part that gets lost is which constraint actually bites first. It isn't whether the country can generate the power. It's that you can't plug a gigawatt into the grid on demand. Interconnection queues run five to ten years. The water side is just as ugly, a big campus can drink millions of gallons a day, so towns are already voting no. That's the real reason the cold, empty, power-rich regions keep winning the training buildout. Not vibes. Orbit is one answer to the squeeze. It's just the hardest one, and Earth still has easier doors open.

0

0

0

0

8

Mani Swaminathan

14 days ago

I've been posting about orbital compute for a week and people keep hearing me as either a believer or a hater. Neither. Here's the actual position. Orbital AI compute is an operations problem before it's anything else. Not a silicon problem, not a launch-cost problem, though those matter. The thing that decides whether any of it works is the unglamorous middle layer. How you move the data down. How you handle a bit-flip nobody flagged. How you schedule around the parts of the orbit that fight you. How you decide which work to throw away and rerun. That layer barely exists yet at training scale. The companies that win this aren't going to be the ones with the best pitch deck about free solar and vacuum cooling. They'll be the ones who quietly did the arithmetic the press releases skip, and built the boring software underneath. That's the whole bet I'm interested in. Everything else is set dressing.

0

0

0

0

13

Mani Swaminathan

15 days ago

People keep talking about AI infrastructure like it's one big buildout. It's two, and they pull in opposite directions. Training runs in bursts. A handful of labs, enormous jobs, weeks at a time. What it wants is cheap firm power in one spot, and it genuinely could not care less where that spot is. The user could be on the other side of the planet. Doesn't matter. That's why sticking a hydro-fed site up inside the Arctic Circle actually pencils out. Look at Stargate Norway, 230 MW of GPUs in Narvik on renewables, and they want to push it past 500. Inference is the other animal. Billions of tiny queries a day, ChatGPT's running something like 2.5 billion prompts on its own, and every single one wants to sit close to whoever asked. So that load stays put in regional hubs near people. Put it this way. Same chips, totally different problem to solve, and that's the whole trick. Build for one while you're thinking about the other and you'll plant a very expensive building in exactly the wrong place.

0

0

0

0

17

Mani Swaminathan

15 days ago

Starcloud got an H100 into orbit last month and trained a model on it. First time anyone's pulled that off. Worth getting the size of it right though. One GPU. Small model. So the question it answers is "can this stuff even run up there," and yeah, turns out it can. That's it. Nobody's calling it hyperscale training and nobody should. The distance between one chip in orbit and an actual training cluster in orbit is the downlink, the power budget, and the radiation handling I've been banging on about for three posts now. The demo clears none of that. It clears the first inch, which is still a real inch. Right way to go about it, honestly. Get the inch, then we can fight about the mile.

0

0

0

0

26

Mani Swaminathan

16 days ago

Everyone treats radiation as the reason you can't put GPUs in orbit. Wrong problem. Look at the ground first. Meta trained Llama 3 on 16,384 H100s for 54 days. They logged 419 unexpected interruptions. One roughly every three hours. Six of them were silent data corruption, the GPU returning a wrong answer with no error flag. That's in a climate-controlled datacenter with no radiation to speak of. So the fault-handling problem already exists at sea level, and the biggest names in the field are still getting bitten by it. Now move that to low Earth orbit, where single event upsets from cosmic rays and the South Atlantic Anomaly are a documented, everyday thing for commercial parts. You haven't created a new problem. You've taken a problem the ground hasn't fully solved and turned the dial up. Which means orbital compute isn't a shielding bet or a silicon bet. It's a bet that you can build fault handling better than Meta's, in a worse environment. Redundant computation, checkpointing you can actually trust, knowing which work to throw away and rerun. That software layer barely exists at training scale. Radiation doesn't kill the idea. It just decides who's serious.

0

0

0

0

15

Mani Swaminathan

16 days ago

Follow-on to the downlink post. Same argument, other end of it. If the downlink is a straw, the orbital compute question is really a ground question. How many dishes, where, up how often. That's the constraint nobody's funding.

Mani Swaminathan

16 days ago

Orbital compute economics is mostly a ground-station problem and almost nobody putting money into the chip side seems to have clocked it. The math people run stops at the satellite. GPUs, power, radiation, downlink speed. Physics checks out, on to the next slide. But a satellite over open ocean with no dish in view is just a GPU writing checkpoints it can't send anywhere. What you actually get isn't the link rate, it's the link rate times however much of each orbit you can see a station. For one mid-latitude dish that's minutes out of a 90 minute orbit. Not much. Which tells you the lever isn't a faster laser. It's more dishes, more places, more uptime. A point or two of extra visibility buys you more than a whole generation of optical terminal improvement does. Put differently, the orbital compute buildout is a ground segment buildout. Siting, spectrum, weather diversity, scheduling across a global network of dishes. All of it terrestrial, all of it logistics and ops. That's the unsexy half nobody's funding. It's also the half that decides who's actually still standing in five years.

0

0

0

0

34

0

0

0

0

14

Mani Swaminathan

16 days ago

Orbital compute economics is mostly a ground-station problem and almost nobody putting money into the chip side seems to have clocked it. The math people run stops at the satellite. GPUs, power, radiation, downlink speed. Physics checks out, on to the next slide. But a satellite over open ocean with no dish in view is just a GPU writing checkpoints it can't send anywhere. What you actually get isn't the link rate, it's the link rate times however much of each orbit you can see a station. For one mid-latitude dish that's minutes out of a 90 minute orbit. Not much. Which tells you the lever isn't a faster laser. It's more dishes, more places, more uptime. A point or two of extra visibility buys you more than a whole generation of optical terminal improvement does. Put differently, the orbital compute buildout is a ground segment buildout. Siting, spectrum, weather diversity, scheduling across a global network of dishes. All of it terrestrial, all of it logistics and ops. That's the unsexy half nobody's funding. It's also the half that decides who's actually still standing in five years.

0

0

0

0

34

Mani Swaminathan

18 days ago

I'm publishing a piece June 15. Thesis: frontier AI training stays on Earth, probably in the Nordic arctic. Orbital compute is real but it's a sensor-fusion edge play, not a hyperscale play. Anyone pitching either pattern as the other is wrong. I'll be wrong about some of it. Would rather argue with you while there's still time to be wrong in private.

0

0

0

0

29

Mani Swaminathan

20 days ago

@BadCapitalVC Sell the tree, not the mangoes. Buyer owns a named tree. Every mango it bears that season is theirs, harvested, packed, shipped. Live tree feed, harvest date, weight forecast. Turns a commodity into a relationship.

0

6

1

0

397

Mani Swaminathan

20 days ago

The premise that AI compute has to move to orbit because Earth runs out of power is wrong, @awaisahmedna. AI buildout is gated by grid interconnect queues running 4 to 7 years per IEA, not by sunlight. The orbital fix has its own bill. LEO sits in Earth's shadow 35% of every orbit, 15 to 16 orbits a day. H100s pull 700W each and vacuum has no convection, so you need roughly 1.6 m² of radiator per GPU. Training cannot move up there at all. NVLink runs 900 GB/s inside a node. The best demonstrated inter satellite laser link is 10 Gbps. That is 720x short of what an all reduce step needs. @SarvamAI @pratykumar on the other side is a sovereign LLM funded under the IndiaAI Mission, a ₹10,300 crore public outlay from @GoI_MeitY and @AshwiniVaishnaw. It already runs 4,000 plus H100s on the ground in Bengaluru and shipped a 105B model in March. Nothing in this @PixxelSpace tie up makes that training one step faster. What it does move is the news cycle into Pixxel's next raise. Two founders, one photo op, one fundraise, framed on national TV by @ShereenBhan. The bill ends up with Indian taxpayers and Pixxel's next investors.

0

0

0

0

26

Mani Swaminathan

20 days ago

The bandwidth argument for orbital AI compute keeps getting written by people who haven't done the arithmetic. NASA's TBIRD demonstrated 200 gigabits per second optical downlink. Real, in orbit, two years of data. The headline number that everyone is citing. Here is what the same paper says in the methods section. A single TBIRD pass over a ground station is 5 minutes. Peak demonstrated transfer in a pass: 3.6 terabytes. A single Llama-3-70B checkpoint is 150 gigabytes. So a single pass can move roughly 24 of those checkpoints in three minutes. Sounds like plenty. It is not plenty. A frontier training run writes checkpoints continuously, syncs gradients across thousands of nodes, and emits logs and telemetry the entire time. Cluster-internal traffic is measured in terabytes per second, not per pass. The actual constraint is not the downlink speed. It is the visibility window. Three to six minutes per orbit, per ground station, weather permitting. Anyone solving for orbital compute economics is really solving for ground station density. That is the unsexy half of the stack nobody is funding yet. DM if you're working on it.

0

0

0

0

58

Mani Swaminathan

20 days ago

Almost everyone writing about AI data centers in space is asking the wrong question. The question is not whether orbital compute is possible. Starcloud put NVIDIA hardware in orbit in November. NASA's TBIRD has been pushing 200 gigabits per second optical downlink for two years. Sophia Space just raised on the same thesis. The physics works. The question that almost nobody is asking is what the operating model actually looks like once both sides of the stack exist. Four problems sit in that gap. Downlink bandwidth is the new bottleneck, not chip count. 200 gigabits per second sounds enormous until you point it at the output of a real training cluster, and then it is a straw. Radiation is a scheduling problem, not a silicon problem. Anyone betting on commercial GPUs in orbit is really betting on software-level resilience, redundant computation, and checkpointing strategies that do not yet exist at production scale. Training belongs in orbit because it is batch and latency tolerant. Inference belongs on Earth because users will not wait 200 milliseconds for a token. The hybrid scheduler that decides what runs where, based on solar availability, ground station visibility, downlink congestion, and customer latency budgets, is the actual product. It does not exist. And every GPU above 100 kilometers is one ITAR review away from a multi-year regulatory cliff that nobody in the current pitch decks is pricing in. I am obsessed with this layer. The operating model of the Earth-plus-orbit compute stack. Twenty years inside IT services watching infrastructure get built taught me that the new layer always looks like a hardware problem from the outside and turns out to be an operations problem on the inside. This one is going to be the biggest one in our lifetime. Going to write about it here weekly. If you are working on any part of this stack, DM me. I want to find the people I am going to build with.

0

0

0

0

64

Mani Swaminathan

3 months ago

@emirates - EK566 DXB to BLR - the gate agents are unresponsive, rude and now locked us behind doors and not willing to answer any questions. - please help

0

0

0

0

47

Last Seen Users on Sotwe

Trends for you

Most Popular Users