Alex Chekholko @RHAlexander - Twitter Profile

Alex Chekholko

@RHAlexander

about 2 hours ago

@iuditg @immortaldip GPU lead times are 6mo+ for hardware

0

1

Alex Chekholko

@RHAlexander

about 3 hours ago

@demian_ai ok but the rectangular panels still have to come from a cylindrical ingot, right? So it's just a question of at what step you cut the circle into rectangles?

0

1

0

20

Alex Chekholko

@RHAlexander

about 3 hours ago

@jasonschips that's what my 1U servers look like from years ago, 24 or more DIMM slots

0

52

Alex Chekholko

@RHAlexander

about 3 hours ago

@petergyang behold, my .bash_profile: if [ -f ~/.bashrc ]; then . ~/.bashrc fi

0

1

0

5

RHAlexander retweeted

0xDipper

@Dipper_pol

1 day ago

Nassim Taleb: pick two people at random If their combined height is 4.1m, it's basically 2.05 + 2.05. If their combined wealth is $36M, it's almost never 18 + 18 - it's ~$1,000 and ~$36M. Height lives in "Mediocristan," where the average tells you everything. Wealth - and markets - live in "Extremistan," where one event dominates the whole picture. Ruin there never comes from a string of bad days. It comes from a single one. ~1hr lecture, free. The Black Swan author at Cambridge on why the statistics you were taught break exactly where it matters. Being right on average means nothing if one tail empties the account.

32

3K

287

5K

570K

RHAlexander retweeted

Aoden Teo

@AodenTeoMT

1 day ago

To download Miso One, check out the repo: https://t.co/meFujN9WtA

13

523

26

994

49K

Alex Chekholko

@RHAlexander

about 18 hours ago

@WheelieInvestor @ai_hyperbull I don’t think sofi makes chips

0

29

Alex Chekholko

@RHAlexander

about 18 hours ago

@rohindhar Will go for 4?

0

74

Alex Chekholko

@RHAlexander

about 18 hours ago

@kneubuehl Hope you got 128GB RAM

0

1

0

2

Alex Chekholko

@RHAlexander

about 18 hours ago

@signulll Got it on the first guess, Hoboken

0

291

Alex Chekholko

@RHAlexander

about 18 hours ago

@mikesimonsen I think maybe you are right; the signage should be like “coffee in 117 seconds” because the impression is you are just going to stand around waiting

0

24

Alex Chekholko

@RHAlexander

about 19 hours ago

@3lectricBrawl On road trips it makes for a good garbage bin; it is not climate-controlled so you have to be mindful of what you put in there

0

7

RHAlexander retweeted

witcheer

@witcheer

1 day ago

Gemma 4 dropped a 12B. I put it on RTX 5090 against its 31B sibling. when you cut a model from 31B to 12B, what do you actually lose? ~ reasoning barely moves GSM8K (math) 97.5 > 96.4 (−1.1) ARC-C (sci reasoning) 97.6 > 94.0 (−3.6) ~ knowledge falls off a cliff MMLU (world knowledge) 87.8 > 78.9 (−8.9) HellaSwag (commonsense) 92.0 > 81.6 (−10.4) ~~~ parameters store facts, not thinking. the 19B you delete is mostly where the model kept its trivia and world-priors, cut it and recall collapses, while the reasoning machinery stays nearly whole. a 12B reasons almost like its big brother. It just knows less. 122 tok/s vs 53 (2.3x faster generation), ~10GB instead of ~24, meaning that you get 20GB+ free on a 32GB card for long context or a second model. so it depends of your workload: reasoning / math / agentic loops = the 12B is nearly free broad-knowledge Q&A with no retrieval = that's the one job worth paying for the 31B.

witcheer's tweet photo. Gemma 4 dropped a 12B.
I put it on RTX 5090 against its 31B sibling.

when you cut a model from 31B to 12B, what do you actually lose?

~ reasoning barely moves
GSM8K (math) 97.5 > 96.4 (−1.1)
ARC-C (sci reasoning) 97.6 > 94.0 (−3.6)

~ knowledge falls off a cliff
MMLU (world knowledge) 87.8 > 78.9 (−8.9)
HellaSwag (commonsense) 92.0 > 81.6 (−10.4)

~~~
parameters store facts, not thinking. the 19B you delete is mostly where the model kept its trivia and world-priors, cut it and recall collapses, while the reasoning machinery stays nearly whole.

a 12B reasons almost like its big brother. It just knows less.

122 tok/s vs 53 (2.3x faster generation), ~10GB instead of ~24, meaning that you get 20GB+ free on a 32GB card for long context or a second model.

so it depends of your workload:

reasoning / math / agentic loops = the 12B is nearly free

broad-knowledge Q&A with no retrieval = that's the one job worth paying for the 31B.

36

646

71

315

56K

RHAlexander retweeted

Max Leiter @maxleiter

1 day ago

For those unfamiliar: https://t.co/zJAknQOYFP

8

166

3

44

14K

RHAlexander retweeted

Abhishek B R

@abhitwt

1 day ago

Every company’s AI workflow rn be like 😭💀

308

26K

3K

5K

2M

RHAlexander retweeted

Master | 最強打野(穢土轉生)

@CryptoMaster_70

1 day ago

2026 vs 2004 from Uncle Jensen to Prime Jensen 黃仁勳上了福布斯封面帥翻

68

4K

308

603

439K

Alex Chekholko

@RHAlexander

1 day ago

@CuriousTimL @edgecase411 It is likely to follow the car in front. If you're by yourself it will probably do the right thing.

0

6

Alex Chekholko

@RHAlexander

1 day ago

"Gemma 4 12B delivers benchmark performance nearing our larger 26B model", so it's worse than 26B-A4B and strictly worse than 31B; I think it's great to improve efficiency, but I want smarter models, not smaller models that are not quite as smart as the existing ones.

0

60