Seva @Two_Above - Twitter Profile

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: https://t.co/fHRdSV7BwZ Token Plan: https://t.co/BDCycxepZw 🚀New! MiniMax Code: https://t.co/GvB4YiB6Ul Weights & Tech Report in ~10 Days

MiniMax_AI's tweet photo. Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities

- Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas
- MiniMax Sparse Attention scales context to 1M
- Natively Multimodal from Step Zero

API: https://t.co/fHRdSV7BwZ
Token Plan: https://t.co/BDCycxepZw
🚀New! MiniMax Code: https://t.co/GvB4YiB6Ul

Weights & Tech Report in ~10 Days

539

9K

1K

3K

4M

0

9

Seva

@Two_Above

5 days ago

Does codex goal mode have a max run time? I hit 24 hours and it stopped :(

0

13

Seva

@Two_Above

5 days ago

This is scary impressive coming from GPT 5.5. Crunched though very convoluted decompilation and RE task

0

59

Seva

@Two_Above

8 days ago

Getting good vibes from this model, both on work and personal projects!

Claude

@claudeai

8 days ago

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

claudeai's tweet photo. Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.

Available today at the same price. https://t.co/EufxL7T1kb

4K

67K

9K

8K

15M

0

4

Seva

@Two_Above

8 days ago

/model claude-opus-4-8 if you want to use it early

0

14

Seva

@Two_Above

11 days ago

How I sleep knowing that world records are records for a reason

0

3

Seva

@Two_Above

11 days ago

@WallisDev Can confirm. I’ve had literally ZERO issues on us-west-2

0

78

Seva

@Two_Above

12 days ago

@berrytop9 Diet Coke is better for this btw

0

1

0

552

Seva

@Two_Above

12 days ago

@xyster Intel is goated until they stop playing catch up. Can’t wait for their C series cards

0

108

Seva

@Two_Above

12 days ago

@JustJake Yeah it was a dream of mine to get my hands on one of these. Intel would have made a killing post~2025 if they stuck with them for a little while longer. Perfect disk for dev work and dbs

Two_Above's tweet photo. @JustJake Yeah it was a dream of mine to get my hands on one of these. Intel would have made a killing post~2025 if they stuck with them for a little while longer.
Perfect disk for dev work and dbs https://t.co/pfBBP7eW9v

0

1

0

1

116

Seva

@Two_Above

13 days ago

@lauriewired So this is in a way more usable than Intel optane set up as swap?

0

4

0

1

648

Seva

@Two_Above

13 days ago

Look how much compute is being left on the table. We can 10x LLM speeds if needed

zR

@zRdianjiao

14 days ago

🚀 GLM-5.1-HighSpeed is live: 400 tokens/s — a new speed ceiling for flagship-tier LLM APIs. Not a smaller model traded for speed. A flagship from @Zai_org that's also the fastest. 📖 Full technical deep-dive 👇 https://t.co/nLEFdMf2Ea

44

937

78

258

69K

0

11

Seva

@Two_Above

14 days ago

checked on onedollarstats and holy - noitool is doing really well. so proud of it

0

5

Seva

@Two_Above

16 days ago

@xyster Thanks! And totally valid use for heatsinks

0

1

0

25

Seva

@Two_Above

16 days ago

@JustJake Sane solution to an insane incident Thanks for the transparency Can you share any info what the reason for the suspension was?

0

5

0

863

Seva

@Two_Above

16 days ago

Please go read this - really great paper with some fancy and clever training and architecture decisions Here’s some cool things in no particular order: HRM-Text omits broad raw-text pretraining and trains exclusively on instruction-response pairs from scratch. MagicNorm, which ex-ploits the asymmetry between the forward and backward computational horizons induced by truncated backpropagation through time. combining small recurrent reasoning models with external or learned knowledge stores is a promising direction We hypothesize that the instabilities observed under deep BPTT in looped architectures are a consequence of the intrinsically multiplicative structure of gradient propagation through repeated iterations. Specifically, gradients backpropagate through products of Jacobian-like operators across loop steps, and theory for products of many random matrices predicts that the logarithm of the norm of such products is approximately Gaussian, implying lognormal-like variability in gradient magnitudes and increasing separation between typical and extreme values as backward depth grows

Two_Above's tweet photo. Please go read this - really great paper with some fancy and clever training and architecture decisions

Here’s some cool things in no particular order:

HRM-Text omits broad raw-text pretraining and trains exclusively on instruction-response pairs from scratch.

MagicNorm, which ex-ploits the asymmetry between the forward and backward computational horizons induced by truncated backpropagation through time.

combining small recurrent reasoning models with external or learned knowledge stores is a promising direction

We hypothesize that the instabilities observed under deep BPTT in looped architectures are a consequence of the intrinsically multiplicative structure of gradient propagation through repeated iterations. Specifically, gradients backpropagate through products of Jacobian-like operators across loop steps, and theory for products of many random matrices predicts that the logarithm of the norm of such products is approximately Gaussian, implying lognormal-like variability in gradient magnitudes and increasing separation between typical and extreme values as backward depth grows

Guan Wang

@makingAGI

16 days ago

The HRM-Text paper is now available 🎉 HRM-Text explores a different approach to language model pretraining: hierarchical recurrent computation, task-completion training, and latent-space reasoning. At just 1B parameters, HRM-Text achieves competitive performance with dramatically lower training cost and data requirements. 1B parameters 40B unique tokens ~1 day of pretraining ~$1000 training cost

makingAGI's tweet photo. The HRM-Text paper is now available 🎉

HRM-Text explores a different approach to language model pretraining: hierarchical recurrent computation, task-completion training, and latent-space reasoning.

At just 1B parameters, HRM-Text achieves competitive performance with dramatically lower training cost and data requirements.

1B parameters
40B unique tokens
~1 day of pretraining
~$1000 training cost

25

757

104

604

87K

0

39

Seva

@Two_Above

Last Seen Users on Sotwe

Trends for you

Most Popular Users