@GSchvey continues to surprise me with his thoroughness - a few notes in his assessment.
- stage 1 of Orion was quite simple. At 100B scale, our expected trends follow what we predict from the ResBM paper - put simply, so far scaling laws follow
- we have a LOT of work on heterogenous / smaller nodes - the current version of Orion was stressing the research. We can then back fit these elements
-interruptibility is key to the economic thesis - more on this later
-70% of centralised MFU is frankly wild. We achieved 30% on our third run. A note that currently Colossus 2 on JAX was running at c. 20% before Elon’s team rewrote the NVidia logic in C. we are in the ballpark where this is interesting. The MFU will improve but this is the most exciting part of the result for me (over size).
-these results (to us) mean we can now start pursuing commercialisation. We see a clear path through the remaining technical hurdles.
Combining ephemeral, distributed, fractured compute into indistinguishable training flops is our vision. We are getting so close.
I've been digging into the @MacrocosmosAI@IOTA_SN9 100B parameter pretraining run to understand what it is (or isn't) and potential implications.
My honest assessment below in case its helpful for others:
reserved, interconnected, frontier training compute is the most expensive commodity in the space. On the flip side, compute that can inherently be resold (e.g. spot instances) and lower grade compute can trade at a 90% discount. If you can orchestrate this compute to create training compute at 70% of frontier performance (for 10%) of the price, you can disrupt the entire mechanics of the compute marketplace.
we believe the future of training is inherently resolved around making training workloads liquid. Iota is designed explicitly for this future.
Just orchestrated a 128 node permissionless decentralized training run, in 5 minutes, for 5 TAO, via @IOTA_SN9
They can do this up to 100B param models.
Unbelievable.
https://t.co/hJGZ6O5NrU
ORION LAUNCHES: @macrocosmosai unveils Orion-100B at @proofoftalk on the 1 year anniversary of @IOTA_SN9 launch in this very building.
They asked: Can frontier-scale AI training be distributed?
On Monday, the team published Orion-100B, an early pretraining run designed to test exactly that. At Proof of Talk co-founders @macrocrux and @WSquires took the audience through the results.
Orion-100B represents the largest distributed LLM pretraining run conducted over the open internet to date. The run trained a 100-billion-parameter model architecture across geographically distributed infrastructure.
This was not an attempt to produce a finished frontier model. Orion-100B processed approximately 1.1 billion training tokens over a two-day period before being stopped. The objective was simpler, and arguably more important: demonstrate that training at 100B scale is possible without relying on a single, centralized cluster of GPUs.
For Macrocosmos, the result represents the culmination of more than a year of work.
The version of the story shared on stage was refreshingly unglamorous. The team launched Subnet 9 in 2025 targeting a 15B parameter model and quickly discovered that building distributed training systems in a permissionless environment is harder than it looks. The network struggled. Assumptions broke. The architecture was reworked.
So instead of pushing forwards, they scaled backwards. For months, the team trained smaller 1.5B parameter models, running more than 700 experiments in the process. By their own admission, it wasn't particularly exciting. But it allowed them to harden the networking layer, improve fault tolerance, increase throughput and gradually remove bottlenecks from the system.
Only then did they begin scaling again. 8B. 18B. Then 100B.
The result was Orion.
According to the figures presented, the run achieved average model FLOP utilisation of 30.8%, roughly 65% of the speed of equivalent co-located infrastructure, at a third of the cost. More importantly, the learning dynamics remained stable throughout the run, even as the system handled synchronisation and communication across dozens of distributed devices.
The technical depth in the room was high. Questions came in on heterogeneous compute stacking, minimum GPU yield thresholds, the reconstructability of sharded weights, architectural constraints on the ResBLM approach. Halfway through, @const_reborn materialised in the audience and started grilling them on the economics and fault tolerance at scale. They took it in elegant stride.
The Macrocosmos thesis is that the future of AI training does not necessarily belong to ever-larger datacentres. If distributed systems can become sufficiently efficient, they could unlock vast pools of underutilised compute spread across the world.
Today's Orion run was intentionally conservative. The GPUs were distributed, but still professionally provisioned. The next stages of Project Orion will progressively introduce heterogeneous hardware, interruptible spot instances, permissionless participation and eventually consumer-grade devices.
Whether distributed training ultimately reshapes the economics of AI remains to be seen. But either way, Macrocosmos has moved the conversation beyond whether distributed training is possible and towards how far it can scale.
The presentation also provided useful context for @bitstarterai's newly announced ML Track, which was unveiled by @macrozack at the same roundtable event and is linked by a common theme - how Bittensor attracts and supports the next generation of machine learning teams.
Throughout the event, https://t.co/JAtKN9nBPn outlined its approach to reducing the barriers to entry for researchers looking to build on the network, combining subnet funding, infrastructure support, compute resources, partnerships and incubation.
If Macrocosmos' journey demonstrates what is possible once a team is established on Bittensor, https://t.co/JAtKN9nBPn's ambition is to help more teams make that journey in the first place.
For now, however, the spotlight belonged to Orion.
After more than 750 experiments, a year of iteration, and a successful 100B-scale demonstration, Macrocosmos has moved distributed training a little further out of the realm of theory and a little closer to reality.
And judging by the reaction in the room, plenty of people were paying attention.
We are still so early.
Decentralised training has the ability to materially disrupt how inference and training workloads are balanced, and impact both compute utilisation and pricing at a macro scale.
We’ll be talking more about our vision for this in the coming weeks.
Thanks to @jbrukh for flying the training flag.
As of today, there’s essentially 5 companies who have successfully completed meaningful SOTA moving decentralized *pretraining* runs:
- @PrimeIntellect (10B INTELLECT-1, Oct 24)
- @Pluralis (7.5B Node0, Oct 25; 8B Agora launch, May 26)
- @NousResearch (40B Consilience, May 25)
- @covenant_ai (Coventant-72B, Mar 26)
- @MacrocosmosAI (Orion-100B, June 26)
In October ‘24, I thought we would enter a decentralized AI training race. I think that’s gearing up now.
The most important thing to understand about Orion is it closes the circle on the thesis for @IOTA_SN9.
Not just frontier scale models, but using compute acquired at a fraction of the cost, orchestrated together with MFU approaching frontier performance.
Today, we are launching the first stage of Project Orion.
Our early pre-training run of Orion-100B achieves upward of 65% of data-center training efficiency on hardware costing a fraction of the price.
Orion-100B is the first proof point for a simple idea: that underutilized compute around the world can be turned into frontier training capacity.
We believe that this work presents, for the first time, an economically compelling case for training large models using distributed approaches.