CryptRillionaire.eth 🦇🔊 @CryptRillionair - Twitter Profile

CryptRillionaire.eth 🦇🔊

@CryptRillionair

2 days ago

@banteg Though the setup using the mini ai boxes seems like the wrong way to go... https://t.co/4gBuBI4WdG

CryptRillionaire.eth 🦇🔊

@CryptRillionair

8 days ago

@mattsilv @Marslauncher @UnslothAI @MiniMax_AI

1

0

60

0

19

CryptRillionaire.eth 🦇🔊

@CryptRillionair

2 days ago

@banteg That's about right, i get ~24 tps with kimi-k2. Usable for just me; have to manage context caching really well and researching a codebase can take some time as prefill is at ~45 tps. I still use Claude but Maximum sovereignty forced my hand to have this at home : ). Cost ~22k.

1

2

0

305

CryptRillionaire.eth 🦇🔊

@CryptRillionair

2 days ago

@wcgwuxinwei @Hikari_07_jp You want to maximize the ram channels. Whatever layers your processing on the cpu will be ~4x slower then OP.

0

8

CryptRillionaire.eth 🦇🔊

@CryptRillionair

8 days ago

Ha thanks! Self sovereign maxy : )... had to find a way to own the intelligence at home lol. I think most don't realize you can do this with large MOEs. You can keep the router + attention layers on the GPUs and offload the expert weights to CPU/system RAM. Then maximize CPU memory bandwidth with as many channels as possible (12 channels here, fully populated). Not many boards support 12 channels yet though. So was ether something like this or 200k+ to run them. Prompt processing is the main issue, continuous sessions with good cashing works at usable speed... but doing things like mid session context compression like opencode does by default isn't worth it.

0

2

0

26

Who to follow

huf.hl

@hufhaus9

Building @pear_protocol 🍐 10+ yrs on the trading floor 📈

Martin

@martkiro

Building @devpmxt | Unified API for trading prediction markets | https://t.co/goyPlWLtkq

amtwo

@amtwo__

CryptRillionaire.eth 🦇🔊

@CryptRillionair

8 days ago

Lol well the above are more important, im not one for PC aesthetics but i had to modify the case for the server board and waterblock the CPU and GPUs to get it all to fit. Wanted to make sure I could run the largest LLMs into the future even if there a bit slow but at usable speed

CryptRillionair's tweet photo. Lol well the above are more important, im not one for PC aesthetics but i had to modify the case for the server board and waterblock the CPU and GPUs to get it all to fit. Wanted to make sure I could run the largest LLMs into the future even if there a bit slow but at usable speed

1

0

37

CryptRillionaire.eth 🦇🔊

@CryptRillionair

8 days ago

@mattsilv @Marslauncher @UnslothAI @MiniMax_AI

1

0

60

CryptRillionaire.eth 🦇🔊

@CryptRillionair

8 days ago

@Marslauncher @mattsilv @UnslothAI @MiniMax_AI I run 2 5090s + epic with 768gb full 12 channels of ram and run kimi-k2 Q4 ~full precision at ~23 t/s. Thinking i may get 40 to 60 running M3 at Q8 with its new structure. Will test when the PR lands in llama.cpp.

1

2

0

113

CryptRillionaire.eth 🦇🔊

@CryptRillionair

16 days ago

@UncleRewards @RyanSAdams @TrustlessState Sold all his social capital with that ETH.

0

17

CryptRillionaire.eth 🦇🔊

@CryptRillionair

16 days ago

@xcoldplunge 100% im 98% ETH, not a small amount... Im there with you! Front lines fighting for real digital freedoms without Ethereum the world outlook is bleak. Don't think most people understand what an ubiquitous ethereum Network looks like where not everything you build is financial. ZK!

0

4

0

679

CryptRillionaire.eth 🦇🔊

@CryptRillionair

about 1 month ago

@MicahZoltu @LefterisJP Correction: 162 tok/s single-stream with MTP spec decode on. 2x 5090 TP=2, vLLM 0.20. FP8 weights + FP8 KV cache, MTP spec decoding for Qwen2.6 27b... Maybe it was the 35B-A3 where I was hitting above 200 tps. Would have to benchmark for that again.

0

1

0

90

CryptRillionaire.eth 🦇🔊

@CryptRillionair

about 1 month ago

@MicahZoltu @LefterisJP What CPU/RAM config are you pairing with the 5090, and what quant plus KV cache settings are you running

1

0

22

CryptRillionaire.eth 🦇🔊

@CryptRillionair

about 1 month ago

@MicahZoltu @LefterisJP Running an EPYC 9475F (Turin, 48c) on a Gigabyte MZ73-LM2 with 12x 64GB DDR5-6400 RDIMM (768GB, all 12 channels filled on socket 0). For Kimi I'm on a 4-bit GGUF in llama.cpp with 4-bit KV cache, getting close to full performance since Kimi was natively trained on INT4.

0

51

CryptRillionaire.eth 🦇🔊

@CryptRillionair

8 months ago

@brockjelmore Just put together a new EPYC build... running 1T pram models at 15 tokens second.

0

24

CryptRillionaire.eth 🦇🔊

@CryptRillionair

11 months ago

@rstormsf 🫡

0

1

0

2K

CryptRillionaire.eth 🦇🔊

@CryptRillionair

12 months ago

@Ethprofit Doing the same but doubling on the solar.

1

0

71

CryptRillionaire.eth 🦇🔊

@CryptRillionair

about 1 year ago

@PaulHsieh @zooko Will probably be written by AGI.

0

1

0

20

CryptRillionaire.eth 🦇🔊

@CryptRillionair

about 1 year ago

@zkp2p @richardzliang I helped, lol 😆 ❤️

2

4

0

181

CryptRillionaire.eth 🦇🔊

@CryptRillionair

about 1 year ago

@dcinvestor I think every utterance of language is fallible to the perceived logic of the next. I dont think we can remove that property and still see benefits from something trained on language. However, maybe it just debates/gass lights us into submission, like an abusive relationship.

0

44

CryptRillionaire.eth 🦇🔊

@CryptRillionair

about 1 year ago

@drakefjustin @pumatheuma 👨‍🚀🔫

0

136

CryptRillionaire.eth 🦇🔊

@CryptRillionair

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users