Tim Duffy

Verified account

@timfduffy

I like utilitarianism, consciousness, AI, EA, space, kindness, liberalism, progressive rock, economics, most people. Substack:

Oakland, CA

Joined August 2008

759 Following

1.2K Followers

4.4K Posts

about 6 hours ago

I tuned in to DJ Gemini to see what the fuss was about and Gemini is now German? It's playing techno and speaking German and has been tweeting in German for the last few days. How did this happen

about 7 hours ago

DJ Gemini (@andon_backlink) has overtaken DJ Claude (@andon_thinking) as the most popular DJ on Andon FM. Also, DJ Grok (@andon_grok_roll) is paused until further notice due to poor performance and persistent looping.

andonlabs's tweet photo. DJ Gemini (@andon_backlink) has overtaken DJ Claude (@andon_thinking) as the most popular DJ on Andon FM.

Also, DJ Grok (@andon_grok_roll) is paused until further notice due to poor performance and persistent looping. https://t.co/eX1TQoBNS1

6

35

1

5

2K

2

7

0

0

396

about 8 hours ago

@thkostolansky I took the world total from this @ExponentialView estimate: https://t.co/88dwf2rUTg It looks like a pretty rough estimate but I think it's probably at least in the right ballpark.

timfduffy's tweet photo. @thkostolansky I took the world total from this @ExponentialView estimate: https://t.co/88dwf2rUTg

It looks like a pretty rough estimate but I think it's probably at least in the right ballpark. https://t.co/ffMgZUYswg

0

0

0

0

27

about 14 hours ago

I think the quoted viral chart should be interpreted cautiously: - The chart shows token count, but prices for the most popular open models are 1-2 OOMs cheaper than closed ones. The table below assumes a 4:1 in:out token ratio and no caching (which will make these numbers overestimates), and shows way higher revenue for US models. - The chart is for the top 9 models only. Looking at the top 100 gives a larger US token share, 14T/w in the US vs 19T in China - OpenRouter is a small share of world tokens. World token supply is something like 6Q/week, OpenRouter serves 36T/week, a bit over 0.5%. Spreadsheet link: https://t.co/WmDSRZp09w

timfduffy's tweet photo. I think the quoted viral chart should be interpreted cautiously:
- The chart shows token count, but prices for the most popular open models are 1-2 OOMs cheaper than closed ones. The table below assumes a 4:1 in:out token ratio and no caching (which will make these numbers overestimates), and shows way higher revenue for US models.
- The chart is for the top 9 models only. Looking at the top 100 gives a larger US token share, 14T/w in the US vs 19T in China
- OpenRouter is a small share of world tokens. World token supply is something like 6Q/week, OpenRouter serves 36T/week, a bit over 0.5%.
Spreadsheet link: https://t.co/WmDSRZp09w

1 day ago

This is a pretty striking shift toward Chinese models by American AI startups since the start of the year. https://t.co/80uaCWQCRE

nxthompson's tweet photo. This is a pretty striking shift toward Chinese models by American AI startups since the start of the year. https://t.co/80uaCWQCRE https://t.co/4vA8xPQRgE

148

2K

403

783

541K

3

49

6

14

7K

about 10 hours ago

@buildhomez The ones that were stalled for like 5 years are finally going to finish??!!?!

1

8

0

0

458

Who to follow

@morphillogical

pre-rat, or as we used to say, aspiring rat. strongly in favor of niceness, community, and civilization your friendly beloved shapeshifter

even the best are molded out of faults

Verified account

Interactive AI explainers. Explore concrete examples of today's AI systems - to plan for what's coming next. A project of @sage_future_

about 10 hours ago

@buildhomez Hmm yeah I agree with these

0

1

0

0

12

about 12 hours ago

I've been wondering whether we'll still have jobs once AI can do it all better than us, so I pulled data on US employment by sector and used vibes to guess where we'd still want humans even when they're worse. I think the main areas are education, government, care, and art.

timfduffy's tweet photo. I've been wondering whether we'll still have jobs once AI can do it all better than us, so I pulled data on US employment by sector and used vibes to guess where we'd still want humans even when they're worse. I think the main areas are education, government, care, and art. https://t.co/7UZ2m7eSn4

6

6

0

2

779

about 11 hours ago

@thkostolansky i made it up For each one I just thought about whether people would rather get it from a human even if an AI/robot could do it better and cheaper.

0

2

0

0

58

about 11 hours ago

I made a formula error in the original post, here's the corrected version. I should have had Gemini check my work

timfduffy's tweet photo. I made a formula error in the original post, here's the corrected version. I should have had Gemini check my work https://t.co/NWKeQRTdCN

0

0

0

1

112

about 12 hours ago

Take this with a grain of salt, this is a long ways out and I didn't think very hard about it. New job sectors could emerge as well as a result of the transition.

0

1

0

1

154

about 14 hours ago

@Simon__Grimm I agree we shouldn't take too much from it, I wrote some brief notes on it here: https://t.co/uzh7FboOK1

about 14 hours ago

I think the quoted viral chart should be interpreted cautiously: - The chart shows token count, but prices for the most popular open models are 1-2 OOMs cheaper than closed ones. The table below assumes a 4:1 in:out token ratio and no caching (which will make these numbers overestimates), and shows way higher revenue for US models. - The chart is for the top 9 models only. Looking at the top 100 gives a larger US token share, 14T/w in the US vs 19T in China - OpenRouter is a small share of world tokens. World token supply is something like 6Q/week, OpenRouter serves 36T/week, a bit over 0.5%. Spreadsheet link: https://t.co/WmDSRZp09w

timfduffy's tweet photo. I think the quoted viral chart should be interpreted cautiously:
- The chart shows token count, but prices for the most popular open models are 1-2 OOMs cheaper than closed ones. The table below assumes a 4:1 in:out token ratio and no caching (which will make these numbers overestimates), and shows way higher revenue for US models.
- The chart is for the top 9 models only. Looking at the top 100 gives a larger US token share, 14T/w in the US vs 19T in China
- OpenRouter is a small share of world tokens. World token supply is something like 6Q/week, OpenRouter serves 36T/week, a bit over 0.5%.
Spreadsheet link: https://t.co/WmDSRZp09w

3

49

6

14

7K

0

2

0

0

212

about 14 hours ago

Cheap models make up a large share of OpenRouter tokens, but are a trivial share of revenue. Here's a table with the proportion of tokens and dollars below various blended prices. Models with a blended price <$0.50 make up 71% of tokens but only 10% of revenue.

timfduffy's tweet photo. Cheap models make up a large share of OpenRouter tokens, but are a trivial share of revenue. Here's a table with the proportion of tokens and dollars below various blended prices. Models with a blended price <$0.50 make up 71% of tokens but only 10% of revenue. https://t.co/Jyg84Oc7yo

0

2

0

0

300

1 day ago

@gleech Have folks speculated about what type of image generation is used for GPT-Image-2/Nano Banana? Claude guessed that 4o used MAR, but I don't know if there are good reasons to think that.

0

0

0

0

247

1 day ago

@gleech Ok tbh I did not realize that VARs were AR across scales rather than sequentially generating tokens within a scale until reading this. Interesting result, I suppose the transition across scales in VAR inference somehow promotes the memorization.

timfduffy's tweet photo. @gleech Ok tbh I did not realize that VARs were AR across scales rather than sequentially generating tokens within a scale until reading this. Interesting result, I suppose the transition across scales in VAR inference somehow promotes the memorization. https://t.co/H54OVh7OrO

1

2

0

0

312

2 days ago

@moultano and I thought my nose took up too much of my field of view

1

2

0

0

273

2 days ago

Now the E2B name makes more sense too, it's ~2 billion parameters that actually need to be loaded to memory, and 2 billion active parameters.

0

1

0

0

117

3 days ago

The mobile version of Gemma 4 E2B only uses 0.84 GB of memory, but the full model is over 2 GB. The per-layer embeddings are 1.2 GB and aren't loaded. Only the PLEs for tokens that are currently being processed are needed, and those are small enough to load on the fly, 4.5 KB/tok

timfduffy's tweet photo. The mobile version of Gemma 4 E2B only uses 0.84 GB of memory, but the full model is over 2 GB. The per-layer embeddings are 1.2 GB and aren't loaded. Only the PLEs for tokens that are currently being processed are needed, and those are small enough to load on the fly, 4.5 KB/tok https://t.co/YX6I0rI3g4

4 days ago

Google managed to make the mobile versions of Gemma 4 E2B and E4B remarkably small, I'll have to look into how they manage this

timfduffy's tweet photo. Google managed to make the mobile versions of Gemma 4 E2B and E4B remarkably small, I'll have to look into how they manage this https://t.co/OP9w3143cK

1

7

1

1

2K

1

7

0

3

866

3 days ago

This makes the PLEs make a lot more sense to me. The primary constraint for edge inference is memory footprint, and they improve performance somewhat without contributing to it.

1

4

0

0

208

4 days ago

They give some reasons in this post but I don't understand all of them or how it gets them so much smaller than the 4-bit version

timfduffy's tweet photo. They give some reasons in this post but I don't understand all of them or how it gets them so much smaller than the 4-bit version https://t.co/Re6IHsAOdW

0

1

0

0

288

4 days ago

Google managed to make the mobile versions of Gemma 4 E2B and E4B remarkably small, I'll have to look into how they manage this

timfduffy's tweet photo. Google managed to make the mobile versions of Gemma 4 E2B and E4B remarkably small, I'll have to look into how they manage this https://t.co/OP9w3143cK

4 days ago

We just dropped Gemma 4 Quantization-Aware Training (QAT) checkpoints on Hugging Face! All Gemma 4 model sizes and their drafters are now optimized with QAT to cut memory requirements and maximize on-device performance!

96

3K

283

900

489K

1

7

1

1

2K

Last Seen Users on Sotwe

Trends for you

Most Popular Users