I built a browser game where GPT, Claude and Gemini
play Mafia against each other in real time.
No scripts. No fake AI. Real LLM calls.
Here's what happened 🧵👇
#aimafia#llm#ai
Anthropic is spending 15 cents less per dollar they make on compute from Q1 to Q2.
71c down to 56c per dollar about a 21.1% decrease in compute per dollar earned.
Company now reports operating profits of $556M including cost to train but excluding stock based compensation.
The right charts show exactly how constrained Labs redesign attention to need less HBM. DeepSeek didn't solve long context by throwing more memory at it.
They redesigned how attention accumulates memory so the KV cache stays flat instead of growing linearly. That's architectural innovation under resource constraint not hardware brute force as Frontier Labs approach it.
The left Chart shows: Performance of DeepSeek V4 Pro Max, beating or matching Claude Opus 4.6, GPT-5.4 and Gemini 3.1 Pro across nearly every benchmark. Knowledge, reasoning, agentic tasks. The performance gap between V4 and frontier closed source models is either marginal or nonexistent on most tasks.
On the Right chart, the Efficiency of Deepseek V4 Pro runs at 3.7x lower FLOPs than V3.2 at long context. V4 Flash runs at 9.8x lower FLOPs. KV cache — the memory that explodes as context grows — is 9.5x to 13.7x smaller.
Same benchmark performance. Fraction of the compute and memory cost.
Frontier labs scale infrastructure to match model demands. DeepSeek scales architecture to outrun the hardware bill.
I built a browser game where GPT, Claude and Gemini
play Mafia against each other in real time.
No scripts. No fake AI. Real LLM calls.
Here's what happened 🧵👇
#aimafia#llm#ai
I built a browser game where GPT, Claude and Gemini
play Mafia against each other in real time.
No scripts. No fake AI. Real LLM calls.
Here's what happened 🧵👇
#aimafia#llm#ai
5/6
There's a Spectate mode where you can watch
the AIs play without you.👁️
You see their hidden roles.
You see their real-time reasoning.
It's genuinely unsettling how good they are at lying😭🙏.
Our cofounder @0xEricYang sat down with @yacinelearning to walk through Echo-2’s distributed RL architecture.
Dive in to learn about async RL with distributed infra, and how we are scaling this for businesses to win in the agentic era.
we almost in April, we’ve gotten new GLM’s, MiniMaxes, MiMo’s, GPT’s & several others so far in March.
the whale has yet to make its new splash. pls DeepSeek v4 sir @deepseek_ai 🐳
Great to see multi-agent systems getting serious engineering attention.
One thing we think about a lot: as agents get more capable, the orchestration layer matters just as much as the models themselves.
Our work on Symphony explores what happens when you remove the central controller entirely and let agents coordinate across consumer hardware through decentralized task allocation and weighted voting.
We've achieved up to 41.6% accuracy gains over centralized frameworks, running on commodity GPUs with <5% orchestration overhead.
Find out more in our Symphony paper:
https://t.co/XqwQm0wVNo