Ian

@munga1_

It's all a matter of perspective.

Joined January 2011

916 Following

442 Followers

11.9K Posts

Pinned Tweet

Ian @munga1_

over 2 years ago

📌

✯ @iiluhbbyc

over 2 years ago

life’s about to get soooo good im super excited already

munga1_ retweeted

Aseel Swaid

@aseelswaid9

about 16 hours ago

Sertão, 1996. A. Abbas.

Ian @munga1_

9 days ago

A good 10 goals almost denied to my goat, @B_Fernandes8 why didn't you assist these you dickhead

Lexa

@Alexaxfcb

9 days ago

“Por qué Cristiano Ronaldo no puede ganar con Portugal pero Messi con Argentina sí?” Y aquí está la respuesta

71K

11M

300

Ian @munga1_

21 days ago

@Ian_Schwartzman putting every single Sasha featured episode behind a paywall is crazy, who TF is paying for that

Who to follow

Emmanuel Pesa

@PesaEmmanuel

Astute career banker & financial accountant

Published author/Social commentator/everything books/of Lemba tribe

Ian @munga1_

about 1 month ago

😭😭😭😭 funny asf, bro took initiative

Michael Zhou

@michaelzixizhou

about 1 month ago

bro took it upon himself and "went ahead" to make AI branded sites with my domain and logo on every other social media LMAO HELP

103

33K

Ian @munga1_

about 1 month ago

5 billion tokens spent for like $130. This would've been like $140k for Opus. Thank you @deepseek_ai you fucking geniuses you

Chubby♨️

@kimmonismus

about 1 month ago

DeepSeek just made its 75% price cut on V4-Pro permanent. Xiaomi's MiMo slashed V2.5 pricing by up to 99%, effective today. Most coverage frames this as a price war. The more interesting part is the engineering that makes these numbers sustainable. DeepSeek's V4 paper describes a *hybrid attention architecture* that attacks the core bottleneck of long-context inference: the KV cache. Traditional transformers store key-value pairs for every token in the context. At 1 million tokens, this cache alone can fill an entire GPU's memory. V4 introduces two interleaved attention types. Compressed Sparse Attention (CSA) compresses every 4 tokens into a single KV entry, then selects only the top-k most relevant compressed blocks per query. Heavily Compressed Attention (HCA) goes further, compressing 128 tokens into one entry and running dense attention over the result. The compressed sequence is short enough that dense attention stays cheap. V4-Pro's KV cache at 1M tokens is 10% (!!) of V3.2's. Single-token inference FLOPs drop to 27% (!!). The model has 1.6 trillion total parameters but only activates 49 billion per token through Mixture-of-Experts routing, the knowledge capacity of a massive model at the compute cost of one thirty times smaller. MiMo's approach is different but lands in the same place. Xiaomi's team implemented Sliding Window Attention via SGLang HiCache, reducing KV cache data transfer across GPU memory, CPU memory, and SSD to roughly 1/7 (!!) of previous volume. Cacheable tokens expanded by 5x (!!). Combined with expert parallelism optimization and input length bucketing, per-token serving cost dropped enough to make permanent pricing at these levels viable. V4-Pro now sits at $0.87 per million output tokens. MiMo V2.5-Pro at roughly $3/M output, with Flash variants far below that. A year ago, sub-dollar output pricing meant you were using a small distilled model with real capability tradeoffs. These are frontier-class reasoners with million-token context windows. Both companies can commit to permanent cuts because the reductions come from the architecture itself. When your attention mechanism physically processes fewer FLOPs per token and your cache occupies a fraction of the memory, the cost to serve is structurally lower. The price follows the cost curve.

$kimmonismus's tweet photo. DeepSeek just made its 75% price cut on V4-Pro permanent. Xiaomi's MiMo slashed V2.5 pricing by up to 99%, effective today. Most coverage frames this as a price war. The more interesting part is the engineering that makes these numbers sustainable. DeepSeek's V4 paper describes a *hybrid attention architecture* that attacks the core bottleneck of long-context inference: the KV cache. Traditional transformers store key-value pairs for every token in the context. At 1 million tokens, this cache alone can fill an entire GPU's memory. V4 introduces two interleaved attention types. Compressed Sparse Attention (CSA) compresses every 4 tokens into a single KV entry, then selects only the top-k most relevant compressed blocks per query. Heavily Compressed Attention (HCA) goes further, compressing 128 tokens into one entry and running dense attention over the result. The compressed sequence is short enough that dense attention stays cheap. V4-Pro's KV cache at 1M tokens is 10% (!!) of V3.2's. Single-token inference FLOPs drop to 27% (!!). The model has 1.6 trillion total parameters but only activates 49 billion per token through Mixture-of-Experts routing, the knowledge capacity of a massive model at the compute cost of one thirty times smaller. MiMo's approach is different but lands in the same place. Xiaomi's team implemented Sliding Window Attention via SGLang HiCache, reducing KV cache data transfer across GPU memory, CPU memory, and SSD to roughly 1/7 (!!) of previous volume. Cacheable tokens expanded by 5x (!!). Combined with expert parallelism optimization and input length bucketing, per-token serving cost dropped enough to make permanent pricing at these levels viable. V4-Pro now sits at $0.87 per million output tokens. MiMo V2.5-Pro at roughly $3/M output, with Flash variants far below that. A year ago, sub-dollar output pricing meant you were using a small distilled model with real capability tradeoffs. These are frontier-class reasoners with million-token context windows. Both companies can commit to permanent cuts because the reductions come from the architecture itself. When your attention mechanism physically processes fewer FLOPs per token and your cache occupies a fraction of the memory, the cost to serve is structurally lower. The price follows the cost curve.$

766

209

60K

105

Ian @munga1_

about 1 month ago

Had this at first but I opened a cron alert topic, ask your bot for the topic ID specifically, set all cron alerts there. Even when asking the agent about what's in the cron alert topic I tell it to leave the cron alerts there.

Prashanth

@Prashtwtz

about 1 month ago

I’m currently using Telegram to chat with Hermes Agent. It works well for normal conversations, but once you set up cron jobs for different topics, everything starts landing in the same place. Feels like Discord channels per topic might be a cleaner setup. Anyone using it that way? Suggest me the best way to manage things #hermes

Prashtwtz's tweet photo. I’m currently using Telegram to chat with Hermes Agent.

It works well for normal conversations, but once you set up cron jobs for different topics, everything starts landing in the same place.

Feels like Discord channels per topic might be a cleaner setup.

Anyone using it that way? Suggest me the best way to manage things
#hermes

18K

Ian @munga1_

about 1 month ago

Wtf is going on over there

☭ @marxpalestine

about 1 month ago

4 black kids were abducted by 16 Ecuadorian militaries and they burned their bodies. Ecuadors president did everything in his power to imply they weren’t innocent so they deserved it until Courts proven him wrong. The youngest was 11 & oldest 15.

164

15K

392K

munga1_ retweeted

Vanessa Jaye @vanessajaye

about 1 month ago

This is the danger of only learning Eurocentric history

128

27K

350K

Ian @munga1_

2 months ago

Well I guess I found this weekends Claude code tasks

kate ✨ @pulpy_fiction

2 months ago

im SO into the idea of skill exchanging (like exchanging baking lesson for piano lesson, bike maintenance lesson for programming lesson, etc) but i cannot find any platforms dedicated to this! do i need to make one! im very busy but dont tempt me i will!

14K

200K

Ian @munga1_

2 months ago

Week 1-6 plan is being completed in the next 2 hours

Vivo

@vivoplt

2 months ago

No Claude, the project will not take me 2-3 weeks. We will finish it today.

107

261

109

133K

Ian @munga1_

2 months ago

Bubble indicator

Sam Altman

@sama

2 months ago

@scaling01 come to the light side

200

173

310K

Ian @munga1_

2 months ago

I always say the rebrand is crazy, we just know them now for pedophilia and anime and used panty vending machines

Elon Kawayi @moneycometome99

2 months ago

@ANIDEEZY Beef is a not accurate word to describe the relationship between Japan and other Asian countries

663

171

34K

141

Ian @munga1_

2 months ago

We have a Nigerian woman as a politician serving one of the most racist parties in Europe whose main agenda is deportation of immigrants. As the only black woman there mind you.

munga1_'s tweet photo. We have a Nigerian woman as a politician serving one of the most racist parties in Europe whose main agenda is deportation of immigrants. As the only black woman there mind you. https://t.co/59RSR6nmvo

David Hundeyin

@DavidHundeyin

2 months ago

Brother, with all due respect, you work for CNN. Your opinion is the least useful thing in the world - because it's not yours. Whatever Warner Media wants you to think is exactly what you will think because your paycheck depends on it. Dance for your coins but please do so with sense.

829

233

152K

203

Ian @munga1_

3 months ago

😭😭😭😭

E🎀

@ElsSaidIt

3 months ago

Ride cancelled, he’s rude🤚🏾🤚🏾

261

10K

549

602

Ian @munga1_

3 months ago

Freiburg, exactly where they used to burn witches allegedly

Heinke 🎼

@schaffh95557956

3 months ago

Wo bin ich??

158

494K

Ian @munga1_

3 months ago

Germans are very socially immature. The idea of honesty and "being direct" just stems from that. 90% of the time people will confront each other to complain or bitch about things than even say Hi. Most of the times it's nothing even worth bitching about.