Cypher funk

@cypherphunk

🐝🫖🌊

Joined April 2022

1.2K Following

133 Followers

2.6K Posts

cypherphunk retweeted

Frank Rizzo

@FrankRizz07

2 days ago · Sandia Heights

I checked. RT21/GeneralTensor Validator no longer runs ANY subnet code. I think this is a big hit for the ecosystem. 😞 Now there's only a few legitimate validators left.. Please consider staking to Rizzo as we have always put our best foot forward with legitimate Network-wide validation. We could use your support. 🤝 #Bittensor $TAO

12K

cypherphunk retweeted

Bold

@boldleonidas

3 days ago

768

16K

Cypher funk @cypherphunk

4 days ago

@alainde31997119 @clementlox @Meteovilles 🤣

Cypher funk @cypherphunk

6 days ago

@TrashTalk_fr Ça vend pas du rêve

198

Who to follow

KuNaL

@ItsKunallll

what doesn't kill you, fucks you over

0xKayser🔺🦅

@0x_kayser

Blockchain Dev | UI/UX | Building secure protocols | Sharing security reviews & onchain experiments #Solidity #Javascript #Rust #userinterface

Tomss

@tomastypeshii

Cypher funk @cypherphunk

6 days ago

@TrashTalk_fr Il a surtout été trade contre 3 cachous et un slip troué. 😆

124

Cypher funk @cypherphunk

6 days ago

The new feature on Twitter makes shilling kind of hilarious. « Shill me the next 100x » Everyone answers with a ticker that is down double digits 😆

Cypher funk @cypherphunk

9 days ago

@markjeffrey

Cypher funk @cypherphunk

15 days ago

@ben13gm @TrashTalk_fr 🤣

161

cypherphunk retweeted

const

@const_reborn

16 days ago

@AnthropicAI And just like that we collectively saw the future of inequality

266

299K

Cypher funk @cypherphunk

18 days ago

@TrashTalk_fr

211

Cypher funk @cypherphunk

18 days ago

@TrashTalk_fr L’arbitrage de ce match est tellement degueulasse

cypherphunk retweeted

Jon Durbin

@jon_durbin

20 days ago

Here's a draft of the tech report on the model training method I've been experimenting with, "Parallax". https://t.co/0ebFkATteC TL;DR: MoE models' params are mostly routed experts, and you can massively reduce VRAM and FLOPS per participant by splitting up those experts. You can also offload the expert training to commodity hardware further saving compute/VRAM per island. The crazy cool thing about these sketches is, you can actually onboard workers nearly instantly (sync time with ternary weights is a few MB), and they never need to download or stream the raw datasets (sketches contain all the work they need to do and are tiny). You'd probably want the first couple layers of the model in your own infra if you had sensitive data because otherwise you could do gradient inversion attacks to reconstitute the raw text, but beyond the first couple layers and not knowing which layer/expert you're training I think it's infeasible so privacy is pretty baked in. Decoupled DiLoCo/RDA-diloco style backbone sync, surrogates for non-owned routed experts with low rank updates to sync those, tiered sync cadences for various components, "sketches" to offload expert work, etc. 20b tested two different ways, plenty of small model iterations, and 176b params just to prove out feasibility. There are hundreds of additional experiments and loads of data we could also highlight as well, but the guts are there. Variations: - freeze routed expert weights instead of using surrogates, eliminates adam state, backward pass, etc., though you'd need different sync methods vs. low rank updates to the surrogates (the surrogates are already tiny, rank-8 updates to those even smaller) - hierarchical parallax, i.e. each node itself becomes an rda diloco style multi-learner which then syncs with the outer/global islands (the point here is to enable GPUs without NVLink/etc. and reduce GPU<=>GPU comms to maximize MFU on less-capable/commodity GPUs) - pipeline parallelize each island itself such that each island can decompose the backbone etc.

jon_durbin's tweet photo. Here's a draft of the tech report on the model training method I've been experimenting with, "Parallax".
https://t.co/0ebFkATteC

TL;DR: MoE models' params are mostly routed experts, and you can massively reduce VRAM and FLOPS per participant by splitting up those experts. You can also offload the expert training to commodity hardware further saving compute/VRAM per island.

The crazy cool thing about these sketches is, you can actually onboard workers nearly instantly (sync time with ternary weights is a few MB), and they never need to download or stream the raw datasets (sketches contain all the work they need to do and are tiny). You'd probably want the first couple layers of the model in your own infra if you had sensitive data because otherwise you could do gradient inversion attacks to reconstitute the raw text, but beyond the first couple layers and not knowing which layer/expert you're training I think it's infeasible so privacy is pretty baked in.

Decoupled DiLoCo/RDA-diloco style backbone sync, surrogates for non-owned routed experts with low rank updates to sync those, tiered sync cadences for various components, "sketches" to offload expert work, etc.

20b tested two different ways, plenty of small model iterations, and 176b params just to prove out feasibility.

There are hundreds of additional experiments and loads of data we could also highlight as well, but the guts are there.

Variations:
- freeze routed expert weights instead of using surrogates, eliminates adam state, backward pass, etc., though you'd need different sync methods vs. low rank updates to the surrogates (the surrogates are already tiny, rank-8 updates to those even smaller)
- hierarchical parallax, i.e. each node itself becomes an rda diloco style multi-learner which then syncs with the outer/global islands (the point here is to enable GPUs without NVLink/etc. and reduce GPU<=>GPU comms to maximize MFU on less-capable/commodity GPUs)
- pipeline parallelize each island itself such that each island can decompose the backbone etc.

218

43K

Cypher funk @cypherphunk

22 days ago

@AlgodTrading You’re kind of an expert on failed subnets yourself but sadly it doesn’t come with great foresight on what should be done on the protocol level. Maybe it’s time to just shut up if all you can do is whining like a little boy

128

cypherphunk retweeted

Aporia

@0xaporia

over 3 years ago

When the daily coin issuance denominated in USD falls below 50% of its 1-year moving average, compulsory sell-side liquidity is considered extremely low, leading to bottoms forming

0xaporia's tweet photo. When the daily coin issuance denominated in USD falls below 50% of its 1-year moving average, compulsory sell-side liquidity is considered extremely low, leading to bottoms forming https://t.co/zd8255yZsS

866

715

375K

cypherphunk retweeted

Bold

@boldleonidas

24 days ago

129

209

88K

cypherphunk retweeted

Chutes

@chutes_ai

26 days ago

Chutes Search is now powered by @desearch_ai (SN22), the decentralized search layer for agents and humans. In their first public benchmark, it beat the centralized providers on overall score and ranked best of all of them at finding relevant sources. So we swapped it in to power our search. Two subnets, one better search experience. This is the Bittensor flywheel shipping. SN22 x SN64. Want to try it? https://t.co/S2dK7rfhOU

chutes_ai's tweet photo. Chutes Search is now powered by @desearch_ai (SN22), the decentralized search layer for agents and humans.

In their first public benchmark, it beat the centralized providers on overall score and ranked best of all of them at finding relevant sources. So we swapped it in to power our search.

Two subnets, one better search experience.
This is the Bittensor flywheel shipping.

SN22 x SN64. Want to try it?
https://t.co/S2dK7rfhOU

167

11K

cypherphunk retweeted

Macrocosmos

@MacrocosmosAI

27 days ago

Today, we are launching the first stage of Project Orion. Our early pre-training run of Orion-100B achieves upward of 65% of data-center training efficiency on hardware costing a fraction of the price. Orion-100B is the first proof point for a simple idea: that underutilized compute around the world can be turned into frontier training capacity. We believe that this work presents, for the first time, an economically compelling case for training large models using distributed approaches.

$MacrocosmosAI's tweet photo. Today, we are launching the first stage of Project Orion. Our early pre-training run of Orion-100B achieves upward of 65% of data-center training efficiency on hardware costing a fraction of the price. Orion-100B is the first proof point for a simple idea: that underutilized compute around the world can be turned into frontier training capacity. We believe that this work presents, for the first time, an economically compelling case for training large models using distributed approaches.$

420

638K

Cypher funk @cypherphunk

26 days ago

@CryptoPicsou I root for you hyperliquid holders. Congrats with this cycle’s trade! Just one thing : can you stop signing off with the hyperliquid gimmick tho? That’s pretty cringe 😆

161

Cypher funk @cypherphunk

27 days ago

@CalebFranzen My average buy in last cycle was below 20k. My average sell price was above 100k. I would have missed quite a bit if I had followed this indicator.

Cypher funk @cypherphunk

27 days ago

@CalebFranzen I actually do. I didn't spell my argument well but my point remains. Your indicator is lagging too much compared to the pace of this market. It's not really hard to out perform both your entry and exit points

Last Seen Users on Sotwe

Trends for you

Most Popular Users

Olivia

Online

✨

⭐

💫