I tuned in to DJ Gemini to see what the fuss was about and Gemini is now German? It's playing techno and speaking German and has been tweeting in German for the last few days. How did this happen
DJ Gemini (@andon_backlink) has overtaken DJ Claude (@andon_thinking) as the most popular DJ on Andon FM.
Also, DJ Grok (@andon_grok_roll) is paused until further notice due to poor performance and persistent looping.
@thkostolansky I took the world total from this @ExponentialView estimate: https://t.co/88dwf2rUTg
It looks like a pretty rough estimate but I think it's probably at least in the right ballpark.
I think the quoted viral chart should be interpreted cautiously:
- The chart shows token count, but prices for the most popular open models are 1-2 OOMs cheaper than closed ones. The table below assumes a 4:1 in:out token ratio and no caching (which will make these numbers overestimates), and shows way higher revenue for US models.
- The chart is for the top 9 models only. Looking at the top 100 gives a larger US token share, 14T/w in the US vs 19T in China
- OpenRouter is a small share of world tokens. World token supply is something like 6Q/week, OpenRouter serves 36T/week, a bit over 0.5%.
Spreadsheet link: https://t.co/WmDSRZp09w
I've been wondering whether we'll still have jobs once AI can do it all better than us, so I pulled data on US employment by sector and used vibes to guess where we'd still want humans even when they're worse. I think the main areas are education, government, care, and art.
@thkostolansky i made it up
For each one I just thought about whether people would rather get it from a human even if an AI/robot could do it better and cheaper.
Take this with a grain of salt, this is a long ways out and I didn't think very hard about it. New job sectors could emerge as well as a result of the transition.
I think the quoted viral chart should be interpreted cautiously:
- The chart shows token count, but prices for the most popular open models are 1-2 OOMs cheaper than closed ones. The table below assumes a 4:1 in:out token ratio and no caching (which will make these numbers overestimates), and shows way higher revenue for US models.
- The chart is for the top 9 models only. Looking at the top 100 gives a larger US token share, 14T/w in the US vs 19T in China
- OpenRouter is a small share of world tokens. World token supply is something like 6Q/week, OpenRouter serves 36T/week, a bit over 0.5%.
Spreadsheet link: https://t.co/WmDSRZp09w
Cheap models make up a large share of OpenRouter tokens, but are a trivial share of revenue. Here's a table with the proportion of tokens and dollars below various blended prices. Models with a blended price <$0.50 make up 71% of tokens but only 10% of revenue.
@gleech Have folks speculated about what type of image generation is used for GPT-Image-2/Nano Banana? Claude guessed that 4o used MAR, but I don't know if there are good reasons to think that.
@gleech Ok tbh I did not realize that VARs were AR across scales rather than sequentially generating tokens within a scale until reading this. Interesting result, I suppose the transition across scales in VAR inference somehow promotes the memorization.
The mobile version of Gemma 4 E2B only uses 0.84 GB of memory, but the full model is over 2 GB. The per-layer embeddings are 1.2 GB and aren't loaded. Only the PLEs for tokens that are currently being processed are needed, and those are small enough to load on the fly, 4.5 KB/tok
This makes the PLEs make a lot more sense to me. The primary constraint for edge inference is memory footprint, and they improve performance somewhat without contributing to it.
We just dropped Gemma 4 Quantization-Aware Training (QAT) checkpoints on Hugging Face!
All Gemma 4 model sizes and their drafters are now optimized with QAT to cut memory requirements and maximize on-device performance!