I checked. RT21/GeneralTensor Validator no longer runs ANY subnet code. I think this is a big hit for the ecosystem. 😞
Now there's only a few legitimate validators left..
Please consider staking to Rizzo as we have always put our best foot forward with legitimate Network-wide validation. We could use your support. 🤝
#Bittensor $TAO
Here's a draft of the tech report on the model training method I've been experimenting with, "Parallax".
https://t.co/0ebFkATteC
TL;DR: MoE models' params are mostly routed experts, and you can massively reduce VRAM and FLOPS per participant by splitting up those experts. You can also offload the expert training to commodity hardware further saving compute/VRAM per island.
The crazy cool thing about these sketches is, you can actually onboard workers nearly instantly (sync time with ternary weights is a few MB), and they never need to download or stream the raw datasets (sketches contain all the work they need to do and are tiny). You'd probably want the first couple layers of the model in your own infra if you had sensitive data because otherwise you could do gradient inversion attacks to reconstitute the raw text, but beyond the first couple layers and not knowing which layer/expert you're training I think it's infeasible so privacy is pretty baked in.
Decoupled DiLoCo/RDA-diloco style backbone sync, surrogates for non-owned routed experts with low rank updates to sync those, tiered sync cadences for various components, "sketches" to offload expert work, etc.
20b tested two different ways, plenty of small model iterations, and 176b params just to prove out feasibility.
There are hundreds of additional experiments and loads of data we could also highlight as well, but the guts are there.
Variations:
- freeze routed expert weights instead of using surrogates, eliminates adam state, backward pass, etc., though you'd need different sync methods vs. low rank updates to the surrogates (the surrogates are already tiny, rank-8 updates to those even smaller)
- hierarchical parallax, i.e. each node itself becomes an rda diloco style multi-learner which then syncs with the outer/global islands (the point here is to enable GPUs without NVLink/etc. and reduce GPU<=>GPU comms to maximize MFU on less-capable/commodity GPUs)
- pipeline parallelize each island itself such that each island can decompose the backbone etc.
@AlgodTrading You’re kind of an expert on failed subnets yourself but sadly it doesn’t come with great foresight on what should be done on the protocol level. Maybe it’s time to just shut up if all you can do is whining like a little boy
When the daily coin issuance denominated in USD falls below 50% of its 1-year moving average, compulsory sell-side liquidity is considered extremely low, leading to bottoms forming
Chutes Search is now powered by @desearch_ai (SN22), the decentralized search layer for agents and humans.
In their first public benchmark, it beat the centralized providers on overall score and ranked best of all of them at finding relevant sources. So we swapped it in to power our search.
Two subnets, one better search experience.
This is the Bittensor flywheel shipping.
SN22 x SN64. Want to try it?
https://t.co/S2dK7rfhOU
Today, we are launching the first stage of Project Orion.
Our early pre-training run of Orion-100B achieves upward of 65% of data-center training efficiency on hardware costing a fraction of the price.
Orion-100B is the first proof point for a simple idea: that underutilized compute around the world can be turned into frontier training capacity.
We believe that this work presents, for the first time, an economically compelling case for training large models using distributed approaches.
@CryptoPicsou I root for you hyperliquid holders. Congrats with this cycle’s trade! Just one thing : can you stop signing off with the hyperliquid gimmick tho? That’s pretty cringe 😆
@CalebFranzen My average buy in last cycle was below 20k. My average sell price was above 100k. I would have missed quite a bit if I had followed this indicator.
@CalebFranzen I actually do. I didn't spell my argument well but my point remains. Your indicator is lagging too much compared to the pace of this market. It's not really hard to out perform both your entry and exit points