@IranObserver0 this only works if the US drops sanctions.. otherwise noone will dare pay. So I guess the US is allowing it to happen? Too bad noone can see the MOU to know for sure.
@borntogambles ok and what is he even selling? 2000 SKUs.. of what?! thats the hardest part, finding a product to sell with high margin.. not telling claude code to add some products from a spreadsheet/pdf to your Shopify site.
The 128GB number is the part everyone's repeating. The number that actually decides whether you'd use this box is 256.
That's the memory bandwidth, in GB/s. A 5090 moves about 1,800. An H100 moves 3,350. Local token speed is bound by how fast weights get read out of memory, and this APU reads them at roughly a seventh of a gaming GPU.
So the headline does something quiet. Qwen3 235B runs here at about 11 tokens a second, which sounds impossible on 256 GB/s until you notice the model is mixture-of-experts: 235B total, ~22B active per token. The chip only moves the 22B it needs. The "235B" on the slide is a storage stat. The 22B is the speed stat.
Run something dense and the trick drops. Llama 3.3 70B, where every parameter fires on every token, does about 5 tokens a second on the same box. Readable. Not something you sit in front of for eight hours.
That 3x win over a 5080 lives in the same place. A 5080 has 16GB of VRAM and can't hold a 235B model at all, so it spills to system memory and crawls. The APU wins that matchup on capacity. Change the test to a model that fits in 16GB and the 5080 walks away on speed.
Now look at the workload in the pitch: point Claude Code at localhost. Agentic coding is the worst possible fit for a bandwidth-starved box. One task is dozens of sequential model round trips, each waiting on the last, each streaming at 11 tokens a second. The exact use case used to sell the $5,280 in savings is the one that exposes the bottleneck.
The same Qwen3 235B runs at 1,500 tokens a second on a Cerebras wafer. That's the real comparison: 1,500 versus 11, and how much of your day goes to watching the slow one think.
The box is a real deal for what it is. A quiet, private, $1,800 machine that runs big open models at conversational speed for one person. The frontier stack it's sold as replacing answers at 50 to 100 tokens a second with quality no open 235B matches yet. It pays for itself in 9 months only if your time is worth nothing per token.
@0xcoked apart from the embarassement (which can be blamed on Trump) its not so bad. US markets love it.. oil crushed. war resolved before midterms. +how much of that $300B was already Iran's? now Persian gulf is perm. throttled and its not the US doing it - its Iran.. hurts China+EU.
@1holtei@ItakGol “Free” anything is always peasant level and worse than the paid version. It’s better to drive down the cost and innovate so that everyone gets something that keeps improving.
@CitoyenBrexile@ItakGol Europe has some of the highest cost of living in the world coupled with high crime in cities, high taxes, and is obsessed with going to war with Russia