Steven (Batman) Batchelor-Manning @S_BatMan - Twitter Profile

Beyond the Leaderboard #4: Can a 9.6GB Local Model Outcode a 400B Cloud Titan? Four days ago, I started a benchmark series with a simple question: what happens when you measure AI models by what they actually do, not by their benchmark-suite scores? - Day 1: Kimi K2.6 — 0.66 overall. Fast, decent, nothing special. - Day 2: DeepSeek-V4-Pro — 0.72 overall. Slow but precise. A specialist. - Day 3:MiniMax-M3 — 0.80 overall. The surprise leader. Fast, balanced, but hallucinates on recent knowledge. Today, Day 4: Google DeepMind's Gemma4:e4b — a 9.6GB model running locally on Ollama. Result: 0.78 overall. The second-highest score of the series, just behind MiniMax-M3's 0.80. Wait, what?….. 🧵⬇️👇

MichaelGannotti's tweet photo. Beyond the Leaderboard #4: Can a 9.6GB Local Model Outcode a 400B Cloud Titan?
Four days ago, I started a benchmark series with a simple question: what happens when you measure AI models by what they actually do, not by their benchmark-suite scores?
- Day 1: Kimi K2.6 — 0.66 overall. Fast, decent, nothing special. - Day 2: DeepSeek-V4-Pro — 0.72 overall. Slow but precise. A specialist. - Day 3:MiniMax-M3 — 0.80 overall. The surprise leader. Fast, balanced, but hallucinates on recent knowledge.
Today, Day 4: Google DeepMind's Gemma4:e4b — a 9.6GB model running locally on Ollama.
Result: 0.78 overall. The second-highest score of the series, just behind MiniMax-M3's 0.80.
Wait, what?…..
🧵⬇️👇

12

5

0

2

639

0

1

0

40

S_BatMan retweeted

NVIDIA AI

@NVIDIAAI

2 days ago

Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

175

3K

443

1K

1M

S_BatMan retweeted

MiniMax (official) @MiniMax_AI

3 days ago

M3 is back in the free tier on @opencode 🚀 Jump in and try it while it lasts!

38

1K

32

93

65K

Who to follow

Rachel Suzanne Tien Wood

@RachelSTWood

Traveling Photographer 🧞‍♀️Speaker + Author 🌏 TIMEPieces Artist + Council 🏴‍☠️ 404 Foundation 🟠Smashtoshi 🍕 lover

Shlomow Photography NFT

@Shlomow_nft

Award winning, wildlife and nature professional photographer #NFTphotographer artist

Steve Walasavage 🔥📸🔥

@walasavagephoto

📸 Landscape/Astro Photographer 🌎 Vibrant Earth images 🔥 The fires of creativity can't be contained! (he/him) 🔗🌲: https://t.co/8l9ovK2ZML

Steven (Batman) Batchelor-Manning

@S_BatMan

3 days ago

@wong2__ There are so many good agentic memory systems out there, so picking the right one is critical https://t.co/SZi3M2UxTC

Steven (Batman) Batchelor-Manning

@S_BatMan

about 1 month ago

https://t.co/61clZ3kL76

13

153

15

201

426K

0

1

0

1

274

Steven (Batman) Batchelor-Manning

@S_BatMan

4 days ago

Good to see further improvements to a powerhouse model

RyanLee

@RyanLeeMiniMax

4 days ago

🔥 MiniMax-M3 just got faster. Thanks for all the excitement around MiniMax-M3 — the response has been far beyond our expectations. Last night, we rolled out a major inference upgrade: 🛠️ Fixed an issue that could occasionally produce abnormal tokens 💾 Increased memory and improved cache efficiency 🚀 ~50% higher throughput, with most users now seeing 50–70 TPS You should notice a much smoother experience today. More optimizations are on the way. ❤️

62

764

20

71

53K

1

0

124

Steven (Batman) Batchelor-Manning

@S_BatMan

4 days ago

@sakurayukiai if i go back to the very first article of the series `rag is not enough` and the more apporaches taken like this that stop hope dumping into the context the better.

0

2

Steven (Batman) Batchelor-Manning

@S_BatMan

4 days ago

Your memory system should not be deciding what the agent sees. The agent should. New Article on the quiet reversal in agent memory: stop injecting context, start giving the agent tools. Live now.

S_BatMan's tweet photo. Your memory system should not be deciding what the agent sees. The agent should. New Article on the quiet reversal in agent memory: stop injecting context, start giving the agent tools. Live now. https://t.co/JTCwgvxfuS

4

7

1

0

542

Steven (Batman) Batchelor-Manning

@S_BatMan

4 days ago

@PenfieldLabs graph is a powerhouse but as most systems have found maintining the graph at scale is either slow, hard, costly or all three. looking forward to seeing what the future brings in that space

1

0

251

Steven (Batman) Batchelor-Manning

@S_BatMan

4 days ago

https://t.co/EgSoEYJgth

7

272

20

159

1M

Steven (Batman) Batchelor-Manning

@S_BatMan

4 days ago

@leetllm most teams i work with the first thing i do is get them using a tool recalled system, and they all see the same.

0

5

Steven (Batman) Batchelor-Manning

@S_BatMan

4 days ago

The piece also covers the two-step rhythm (search returns previews, a separate call fetches the full record, ~200K tokens saved per session), and the oh-my-kiro observation pattern. https://t.co/p9YxwVeg7U

Steven (Batman) Batchelor-Manning

@S_BatMan

4 days ago

https://t.co/EgSoEYJgth

7

272

20

159

1M

0

108

Steven (Batman) Batchelor-Manning

@S_BatMan

4 days ago

The highest-leverage refinement in the whole piece: make every tool response end with one line about what to call next. No documentation. The agent learns the API through use. The trace becomes self-documenting.

2

1

0

64

Steven (Batman) Batchelor-Manning

@S_BatMan

6 days ago

Here it is, the long awaited MiniMax M3.

MiniMax (official) @MiniMax_AI

6 days ago

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: https://t.co/fHRdSV7BwZ Token Plan: https://t.co/BDCycxepZw 🚀New! MiniMax Code: https://t.co/GvB4YiB6Ul Weights & Tech Report in ~10 Days

MiniMax_AI's tweet photo. Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities

- Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas
- MiniMax Sparse Attention scales context to 1M
- Natively Multimodal from Step Zero

API: https://t.co/fHRdSV7BwZ
Token Plan: https://t.co/BDCycxepZw
🚀New! MiniMax Code: https://t.co/GvB4YiB6Ul

Weights & Tech Report in ~10 Days

542

10K

1K

3K

4M

0

1

0

135

Steven (Batman) Batchelor-Manning

@S_BatMan

6 days ago

Getting the right Agentic memory saves you tokens, limits, costs and a headache, Get it right!

Steven (Batman) Batchelor-Manning

@S_BatMan

7 days ago

https://t.co/wB6lkFjpvo

8

313

41

314

1M

0

2

0

1

198

Steven (Batman) Batchelor-Manning

@S_BatMan

6 days ago

Hows that context looking for you ? Because there is a good chance you are wasting a lot of tokens https://t.co/ri7f95lPAV

Steven (Batman) Batchelor-Manning

@S_BatMan

7 days ago

https://t.co/wB6lkFjpvo

8

313

41

314

1M

0

1

0

54

Steven (Batman) Batchelor-Manning

@S_BatMan

about 1 month ago

https://t.co/61clZ3kL76

13

153

15

201

426K

Steven (Batman) Batchelor-Manning

@S_BatMan

6 days ago

@garrytan I reviewed 19 memory systems and im pretty such each of you didnt pick the best for you, but take a look and see if you nailed it https://t.co/OrYNbZuStF

Steven (Batman) Batchelor-Manning

@S_BatMan

about 1 month ago

https://t.co/61clZ3kL76

13

153

15

201

426K

0

2

0

56

Steven (Batman) Batchelor-Manning

@S_BatMan

6 days ago

@AzFlin I'd go for a middle-ground, a trusted curated set of skills community managed with safety gates (looking at you npm) and then harnesses that are able to acquire skills on demand if they dont already have something better Matrix style, learn to kungfu

0

67

Steven (Batman) Batchelor-Manning

@S_BatMan

6 days ago

Rag is where most agentic memory systems begun, some layered on it, some added along side it, others ripped it out entirely, but one thing is for certain Vector RAG is not enough https://t.co/05zbl7lwXm

Steven (Batman) Batchelor-Manning

@S_BatMan

about 1 month ago

https://t.co/61clZ3kL76

13

153

15

201

426K

0

2

0

264

Steven (Batman) Batchelor-Manning

@S_BatMan

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users