Charley Peng

@chidpen

Joined March 2012

1.2K Following

107 Followers

7.5K Posts

Charley Peng @chidpen

6 days ago

@dylan522p @iAmHenryMascot Sorry I assumed that was a broader view rather than your users. Interesting insight, I am curious whether it plays out similarly on open router https://t.co/8LpHdhbeVZ

309

Charley Peng @chidpen

6 days ago

@dylan522p @iAmHenryMascot Sorry what source is this from? It can’t be true that Claude API had so much market share on Jun 6

Charley Peng @chidpen

11 days ago

@dah_uk @edzitron Yeah that makes sense, I’m on the $20 plan and it’s so easy to hit limits (I can do it in like 20 minutes). Though oddly it sometimes just lets one continue. Not sure if bug or expected behaviour. It’s not well documented and it changes quite often (over the last two months)

Charley Peng @chidpen

11 days ago

@_nasch_ @degenpark_eth That seems surprisingly far, small quant?

343

Who to follow

Alexandra Topping

@LexyTopping

Political correspondent at the Guardian. Northerner stranded in the South. Long-suffering Evertonian. Views mine alone. Also found at: @lexytopping.bsky.social

Randeep Ramesh

@tianran

Chief leader writer @guardian. Trustee @gdn_foundation. RT not an endorsement. Also @stillwatersrandeep.bsky Public Key: https://t.co/KXsvl52nH6…

FOSSASIA

@fossasia

Developing #FOSS #OpenSource with a global community + organizing events @eventyay @pslabio. Founded by @hpdang @mariobehling

chidpen retweeted

myfleetingdream

@MyFleetingDream

11 days ago

@thetreygoff My problem with Claude Opus models in software development is that they jump to conclusions without verifying assumptions and then double down until explicitly presented with counter evidence. In contrast, GPT models will not paint their incorrect fantasy world for you.

chidpen retweeted

GMI Cloud

@gmi_cloud

11 days ago

Nemotron 3 Ultra is fast and genuinely good Compared it with 3 frontier models: DeepSeek V4, MiniMax M3, and Qwen 3.7 Max on 2 prompts very impressive results

187

224K

Charley Peng @chidpen

12 days ago

@dah_uk @edzitron Thanks, I see, but does this mean you’ve never hit the 5hr limit before?

Charley Peng @chidpen

12 days ago

@dah_uk @edzitron Thanks, I see, what does the UI look like re: billing for this? Does the billing / credits kick in immediately after expiring the 5HR?

Charley Peng @chidpen

12 days ago

@dah_uk @edzitron Thanks, it would seem like this doesn’t apply to the normal subscriptions?

Charley Peng @chidpen

12 days ago

@dah_uk @edzitron Source?

260

chidpen retweeted

Tibo

@thsottiaux

13 days ago

Hi. Over the last 24 hours we had three separate small incidents that affected Codex reliability. Those are three too many and we are taking active steps for them to not reproduce. I have reset usage limits for Codex across all paid plans. May the tokens flow again.

11K

508

491

chidpen retweeted

Chubby♨️

@kimmonismus

20 days ago

DeepSeek just made its 75% price cut on V4-Pro permanent. Xiaomi's MiMo slashed V2.5 pricing by up to 99%, effective today. Most coverage frames this as a price war. The more interesting part is the engineering that makes these numbers sustainable. DeepSeek's V4 paper describes a *hybrid attention architecture* that attacks the core bottleneck of long-context inference: the KV cache. Traditional transformers store key-value pairs for every token in the context. At 1 million tokens, this cache alone can fill an entire GPU's memory. V4 introduces two interleaved attention types. Compressed Sparse Attention (CSA) compresses every 4 tokens into a single KV entry, then selects only the top-k most relevant compressed blocks per query. Heavily Compressed Attention (HCA) goes further, compressing 128 tokens into one entry and running dense attention over the result. The compressed sequence is short enough that dense attention stays cheap. V4-Pro's KV cache at 1M tokens is 10% (!!) of V3.2's. Single-token inference FLOPs drop to 27% (!!). The model has 1.6 trillion total parameters but only activates 49 billion per token through Mixture-of-Experts routing, the knowledge capacity of a massive model at the compute cost of one thirty times smaller. MiMo's approach is different but lands in the same place. Xiaomi's team implemented Sliding Window Attention via SGLang HiCache, reducing KV cache data transfer across GPU memory, CPU memory, and SSD to roughly 1/7 (!!) of previous volume. Cacheable tokens expanded by 5x (!!). Combined with expert parallelism optimization and input length bucketing, per-token serving cost dropped enough to make permanent pricing at these levels viable. V4-Pro now sits at $0.87 per million output tokens. MiMo V2.5-Pro at roughly $3/M output, with Flash variants far below that. A year ago, sub-dollar output pricing meant you were using a small distilled model with real capability tradeoffs. These are frontier-class reasoners with million-token context windows. Both companies can commit to permanent cuts because the reductions come from the architecture itself. When your attention mechanism physically processes fewer FLOPs per token and your cache occupies a fraction of the memory, the cost to serve is structurally lower. The price follows the cost curve.

$kimmonismus's tweet photo. DeepSeek just made its 75% price cut on V4-Pro permanent. Xiaomi's MiMo slashed V2.5 pricing by up to 99%, effective today. Most coverage frames this as a price war. The more interesting part is the engineering that makes these numbers sustainable. DeepSeek's V4 paper describes a *hybrid attention architecture* that attacks the core bottleneck of long-context inference: the KV cache. Traditional transformers store key-value pairs for every token in the context. At 1 million tokens, this cache alone can fill an entire GPU's memory. V4 introduces two interleaved attention types. Compressed Sparse Attention (CSA) compresses every 4 tokens into a single KV entry, then selects only the top-k most relevant compressed blocks per query. Heavily Compressed Attention (HCA) goes further, compressing 128 tokens into one entry and running dense attention over the result. The compressed sequence is short enough that dense attention stays cheap. V4-Pro's KV cache at 1M tokens is 10% (!!) of V3.2's. Single-token inference FLOPs drop to 27% (!!). The model has 1.6 trillion total parameters but only activates 49 billion per token through Mixture-of-Experts routing, the knowledge capacity of a massive model at the compute cost of one thirty times smaller. MiMo's approach is different but lands in the same place. Xiaomi's team implemented Sliding Window Attention via SGLang HiCache, reducing KV cache data transfer across GPU memory, CPU memory, and SSD to roughly 1/7 (!!) of previous volume. Cacheable tokens expanded by 5x (!!). Combined with expert parallelism optimization and input length bucketing, per-token serving cost dropped enough to make permanent pricing at these levels viable. V4-Pro now sits at $0.87 per million output tokens. MiMo V2.5-Pro at roughly $3/M output, with Flash variants far below that. A year ago, sub-dollar output pricing meant you were using a small distilled model with real capability tradeoffs. These are frontier-class reasoners with million-token context windows. Both companies can commit to permanent cuts because the reductions come from the architecture itself. When your attention mechanism physically processes fewer FLOPs per token and your cache occupies a fraction of the memory, the cost to serve is structurally lower. The price follows the cost curve.$

768

211

60K

Charley Peng @chidpen

22 days ago

@juliarturc Maybe you should call them supermodels

104

chidpen retweeted

Elon Musk

@elonmusk

23 days ago

Grok foundation model V9-Medium (1.5T) has finished training. Evals look good. A lot of Cursor data was added in supplementary training and there is more to come. Fine-tuning is underway and reinforcement learning begins in a few days. 2 to 3 weeks to public release. This will be a major improvement over the 0.5T v8-small that currently serves all Grok production traffic, especially for difficult coding tasks.

69K

16M

Charley Peng @chidpen

25 days ago

@coreytufts @_dr5w @OpenRouter what’s the time limit on this?

chidpen retweeted

CJ Zafir

@cjzafir

25 days ago

You know what that means? I can keep generating massive training datasets. Using Codex 5.5 as orchestrator and Deepseek v4 pro as executor. For reference, it costed ~$60 for 200M high quality dataset.

964

584

84K

chidpen retweeted

James Grugett

@jahooma

about 1 month ago

Introducing a 100% free coding agent with DeepSeek v4 Pro Choose any model, all free: - DeepSeek v4 Pro/Flash - Kimi K2.6 - MiniMax M2.7 npm i -g freebuff

316

415

399K

Charley Peng @chidpen

25 days ago

@jahooma @kwlckgf that’s great to hear, I’ve just used an hour of DeepSeek Pro and it is solid, used the Flash model previously. I’ve tried out Antigravity CLI and used up the free weekly quota in about 5 min!

182

Charley Peng @chidpen

25 days ago

@jahooma @kwlckgf I mean I see you say it’s ads and somewhat limited, but surely this can’t last.

681

Charley Peng @chidpen

25 days ago

@jahooma @kwlckgf How is this sustainable? Are DeepSeek subsidising this?

781

Charley Peng

@chidpen

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users