Chamalka Muwangala

@ChamalkaAI

ML Engineer and AI enthusiast. Based out of 🇸🇪

Malmo, Sweden

Joined October 2024

103 Following

15 Followers

90 Posts

ChamalkaAI retweeted

Bryan Johnson

@bryan_johnson

22 days ago

If Hantavirus mutated into a global threat, it would unleash AI + biotech unlike anything we've ever seen. > genome sequenced and public in 4 hours > AlphaFold maps every protein target > AI screens 10,000 drugs in 24 hrs > 50 vaccine candidates designed simultaneously > AI designed antibodies in days > risk of death computed instantly > decentralized trials launch globally > enroll from home > 20 countries manufacturing at once > first doses in three weeks > real-time dose characterization > your genome + biomarkers determine your protocol > variant map updates every hour No one would wait for governments.

519

243

953

585K

ChamalkaAI retweeted

Daniel A. Saedi (DataManDan)

@TheRealDanSaedi

24 days ago

The singular red dot is causing a massive compute shortage and you think memory stocks have topped?

123

307

666K

ChamalkaAI retweeted

Fredrik Hjelm 🇸🇪

@FredrikHjelm4

about 1 month ago

Ambition since childhood: become the world's best hacker. Met Wilma Emanuelsson yesterday. Wilma is such an impressive human being. Started hacking at 7, finding exploits while playing MovieStarPlanet. Turned hacktivist as a teenager. Went after child exploitation and trafficking rings. Founded iTrack Reading before that. Software for people with dyslexia. Look at a word, hear it in your headphones. The idea came from a friend who kept getting held back in school. Now running her next company, Ezteric. She's 22.

FredrikHjelm4's tweet photo. Ambition since childhood: become the world's best hacker.

Met Wilma Emanuelsson yesterday.

Wilma is such an impressive human being.

Started hacking at 7, finding exploits while playing MovieStarPlanet.

Turned hacktivist as a teenager. Went after child exploitation and trafficking rings.

Founded iTrack Reading before that. Software for people with dyslexia. Look at a word, hear it in your headphones. The idea came from a friend who kept getting held back in school.

Now running her next company, Ezteric.

She's 22.

383

15K

ChamalkaAI retweeted

Rohan Paul

@rohanpaul_ai

about 2 months ago

Big claim in this paper. "Prefill-as-a-Service" Prefill, the heaviest part of inference, may finally be portable. Long-context AI is no longer trapped inside a single datacenter. Shows how to run LLM prefill on remote clusters by sending much smaller saved prompt state. So long-prompt work can be done on remote machines and sending back only the smaller saved state needed to answer. The breakthrough is not sending everything farther, but sending the right requests farther. --- When you ask a model a long question, it first has to read and digest the whole prompt before it starts answering. That first step is called prefill, and it is brutally compute-heavy. The second step is decode, where the model generates tokens one by one, and that part is more about memory bandwidth than raw compute. But moving the saved prompt state between those phases is usually so data-heavy that both parts must stay in the same tightly connected cluster. So Until now, those two steps usually had to stay close together inside the same fast network, because prefill creates a huge blob of temporary memory called KVCache that had to be moved quickly to the decode machine. That is the bottleneck. What changed is model design. Newer hybrid-attention models produce much smaller KVCache than older dense-attention models, so shipping that state across ordinary datacenter links starts to become practical instead of absurd. The paper’s idea is a Prefill-as-a-Service setup that sends only long, uncached prompts to a remote prefill cluster, then ships back the saved prompt state, called KV cache, over normal Ethernet while short requests stay local. This works mainly because newer hybrid-attention models create far less KV cache than older dense models, and the system adds smart routing, bandwidth-aware scheduling, and cache-aware placement so the network does not clog up. The authors test this with an internal 1T-parameter hybrid model on a mixed setup that uses H200 GPUs for remote prefill and H20 GPUs for local decode. With a routing threshold near 19.4K tokens, about 50% of requests go remote, average cross-cluster traffic is only 13Gbps on a 100Gbps link, and throughput rises 54% over a local-only baseline and 32% over a naive heterogeneous setup. The real point is that smaller KV cache alone was not enough, but paired with selective offloading and scheduling it makes cross-datacenter LLM serving workable, more flexible, and easier to scale across different hardware. ---- Paper Link – arxiv. org/abs/2604.15039v1 Paper Title: "Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter"

rohanpaul_ai's tweet photo. Big claim in this paper. "Prefill-as-a-Service"

Prefill, the heaviest part of inference, may finally be portable.

Long-context AI is no longer trapped inside a single datacenter.

Shows how to run LLM prefill on remote clusters by sending much smaller saved prompt state. So long-prompt work can be done on remote machines and sending back only the smaller saved state needed to answer.

The breakthrough is not sending everything farther, but sending the right requests farther.

---

When you ask a model a long question, it first has to read and digest the whole prompt before it starts answering.

That first step is called prefill, and it is brutally compute-heavy.

The second step is decode, where the model generates tokens one by one, and that part is more about memory bandwidth than raw compute.

But moving the saved prompt state between those phases is usually so data-heavy that both parts must stay in the same tightly connected cluster.

So Until now, those two steps usually had to stay close together inside the same fast network, because prefill creates a huge blob of temporary memory called KVCache that had to be moved quickly to the decode machine.

That is the bottleneck.

What changed is model design.

Newer hybrid-attention models produce much smaller KVCache than older dense-attention models, so shipping that state across ordinary datacenter links starts to become practical instead of absurd.

The paper’s idea is a Prefill-as-a-Service setup that sends only long, uncached prompts to a remote prefill cluster, then ships back the saved prompt state, called KV cache, over normal Ethernet while short requests stay local.

This works mainly because newer hybrid-attention models create far less KV cache than older dense models, and the system adds smart routing, bandwidth-aware scheduling, and cache-aware placement so the network does not clog up.

The authors test this with an internal 1T-parameter hybrid model on a mixed setup that uses H200 GPUs for remote prefill and H20 GPUs for local decode.

With a routing threshold near 19.4K tokens, about 50% of requests go remote, average cross-cluster traffic is only 13Gbps on a 100Gbps link, and throughput rises 54% over a local-only baseline and 32% over a naive heterogeneous setup.

The real point is that smaller KV cache alone was not enough, but paired with selective offloading and scheduling it makes cross-datacenter LLM serving workable, more flexible, and easier to scale across different hardware.

----

Paper Link – arxiv. org/abs/2604.15039v1

Paper Title: "Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter"

Chamalka Muwangala @ChamalkaAI

about 2 months ago

@vitaliidodonov Damn it worked

Chamalka Muwangala @ChamalkaAI

about 2 months ago

Features i am working on: > Drag and drop components from different papers and build a novel model. > Generate the code for that in real time > Ask questions about the reasoning about your idea.

Chamalka Muwangala @ChamalkaAI

about 2 months ago

Currently building a new tool that I would personally find quite useful. Its an interactive machine learning tool. Its main feature would be so it can help you visualize a model in a research paper, manipulate it and generate code to test it yourself.

Chamalka Muwangala @ChamalkaAI

about 2 months ago

@kiriyo9302 @0xSero And make it local lets gooo

ChamalkaAI retweeted

Claude

@claudeai

about 2 months ago

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

$claudeai's tweet photo. We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost. https://t.co/fRkegyMs5t$

38K

22K

Chamalka Muwangala @ChamalkaAI

about 2 months ago

Tried to use ForgeCode's Forge Harness yesterday night, ran into a bunch or errors regarding the database being full and crashed out hard. Anyone know how to fix this? @forgecodehq

121

Chamalka Muwangala @ChamalkaAI

about 2 months ago

@0xSero Are you coding out of discord tf?!

ChamalkaAI retweeted

Wildminder

@wildmindai

about 2 months ago

RotorQuant - upgraded TurboQuant. > 10x KV cache compression > 28% faster decoding > 5x faster prefill > 44x fewer parameters Same quality as full attention. 1/10th the memory. Ok, another massive VRAM discount for local LLMs. https://t.co/2LHZ47fptn

wildmindai's tweet photo. RotorQuant - upgraded TurboQuant.

> 10x KV cache compression
> 28% faster decoding
> 5x faster prefill
> 44x fewer parameters
Same quality as full attention. 1/10th the memory.
Ok, another massive VRAM discount for local LLMs.

https://t.co/2LHZ47fptn https://t.co/ukCabwrjSz

161

66K

Chamalka Muwangala @ChamalkaAI

about 2 months ago

Be anthroptic >give your ai researchers a crap ton of caffeine and ask them to build a SOTA model >Use this model to accelerate product development >Keep it hidden from the public since Feb 2024 >Leak it so everyone knows whats coming >Drop the model and generate crazy hype

Chamalka Muwangala @ChamalkaAI

about 2 months ago

Mythos about to make a monumental shift in AI on the same day Trump goes on to bomb Iran What the hell is 2026 about 😭

ChamalkaAI retweeted

Deedy

@deedydas

about 2 months ago

Claude Mythos just obliterated every single benchmark in AI. I can't believe what I'm reading.

318

738

775K

ChamalkaAI retweeted

NIK

@ns123abc

about 2 months ago

🚨 Anthropic just revealed their unreleased frontier model called Claude Mythos Preview The model is INSANE It found thousands of zero-day vulnerabilities in EVERY major operating system and browsers: > 27-year-old bug in OpenBSD > 16-year-old bug in FFmpeg that automated tools hit 5M times without catching Completely autonomous. No human steering. They assembled an entire industry coalition called Project Glasswing around it: AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike, JPMorgan, Cisco, Palo Alto, Linux Foundation Goal: patch the world’s software BEFORE releasing it > SWE-bench: 93.9% (Opus 4.6: 80.8%) > Anthropic is committing $100M in usage credits > Thousands of vulnerabilities in 40+ organizations are being fixed right now Yesterday OpenAI published a 13-page essay warning about cyber threats and asking the government to help… Today Anthropic actually fixed them.