Tensor-Slayer @TensorSlay - Twitter Profile

Pinned Tweet

11 months ago

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement: https://t.co/lQOLDFVzf0

1

64

9

47

17K

TensorSlay retweeted

Sebastian Raschka

@rasbt

about 11 hours ago

It's been a while! 4 nice additions to the open-weight local-LLM-on-consumer-hardware ecosystem:

24

574

89

252

26K

TensorSlay retweeted

Lakshya A Agrawal

@LakshyAAAgrawal

1 day ago

Excited to see the use of GEPA-optimized LLM judges for data filtering in MAI-Thinking-1 model's pre-training pipeline!

LakshyAAAgrawal's tweet photo. Excited to see the use of GEPA-optimized LLM judges for data filtering in MAI-Thinking-1 model's pre-training pipeline! https://t.co/wAtVx3KEUE

3

147

19

64

46K

Tensor-Slayer

@TensorSlay

about 16 hours ago

1. Not useful for serious coding projects 2. Extremely useful for dull agentic workflows even over powered. Like really really really good at eye popping frontend, basic python scripting calling REST APIs, web/research, traditional ML model training, more importantly work in a harness properly etc that sort of low hanging fruit stuff. (The low hanging fruit in agentic ai terms still needs Sonnet 4. level intelligence to be dependable)

0

1

0

57

Who to follow

nullpointer

@nullpointar

Software Engineer, part time mobile app developer. Gadget hoarder. Owned and developed for: Symbian: C7, MeeGo: NokiaN9, BlackBerry Z10, Android Nexus4,OP1.

Shintu Dhang

@Shin2_D

Editor - Tech @ExhibitMagazine. Views are personal

TensorSlay retweeted

1 day ago

dspy.GEPA used in pretraining data curation in the new Microsoft AI effort :-)

8

238

24

81

17K

1 day ago

1 day ago

Models covered under the order will be provided to the Federal Government and agencies for a thirty day window before early access begins for select trusted partners, who are yet to be determined.

AndrewCurran_'s tweet photo. Models covered under the order will be provided to the Federal Government and agencies for a thirty day window before early access begins for select trusted partners, who are yet to be determined. https://t.co/SffffeuntR

4

50

6

11

4K

0

1

0

67

Tensor-Slayer

@TensorSlay

2 days ago

Pretty accurate description of what i feel

Zhihu Frontier

@ZhihuFrontier

2 days ago

🚀 MiniMax M3: Aiming for the Stars Zhihu contributor toyama nao shares an early evaluation of MiniMax's new M3 multimodal model. 🔮 TL;DR Back in April, GLM-5.1 pulled decisively ahead of MiniMax M2.7 and took the domestic coding crown. Two months later, MiniMax responds with M3. The upgrade is significant: stronger reasoning, better stability, and much improved coding ability. M3 has firmly entered the "usable" tier. ⚖️ The cost? Efficiency. Token consumption is up 77% versus M2.7—the highest among major models tested. Many medium-complexity tasks now consume 60K–70K tokens, making M3 substantially more expensive in practice. 🧠 Logic & Reasoning Compared with DeepSeek V4 Flash, M3's strengths and weaknesses are clear. ✅ Strength 1: Long-context understanding M3 shows excellent long-context hallucination control, reliably retrieving information from deep inside large documents. On difficult retrieval-heavy tasks, it performs similarly to Qwen3.7-Max and ranks among the strongest domestic models. ✅ Strength 2: Complex reasoning Long-chain reasoning is a major improvement over M2.7. M3 has entered the top tier of Chinese models, solving problems through careful step-by-step exploration rather than relying on sudden insights. ⚠️ Weakness 1: Instruction following M3 handles short and clear prompts extremely well, but performance becomes less predictable with long instructions and extended contexts. As conversations grow longer, the model can suddenly lose track of earlier requirements. ⚠️ Weakness 2: Reasoning efficiency M3 often consumes more tokens than DS4 Flash on comparable tasks. Even medium-difficulty problems regularly exceed 30K tokens, with reasoning traces filled with repeated self-checks and verbose intermediate steps. 💻 Coding Performance M3 is a substantial leap over M2.7, especially in frontend development and software engineering workflows. Its coding behavior is highly structured: planning first, implementing module by module, testing continuously, and validating before delivery. ✅ Strength 1: Better architecture design M3 is much stronger at choosing practical architectures that fit project requirements without overengineering. ✅ Strength 2: Strong self-testing A large portion of M3's coding process is dedicated to self-debugging and validation. For complex issues, it can often locate problems efficiently on its own. ⚠️ Weakness 1: Expensive development cycles Self-testing is also costly. A single task may require dozens of debugging rounds, with testing consuming more tokens than coding itself. ⚠️ Weakness 2: Requirement drift As context grows, M3 can gradually forget parts of the original specification. The final output may pass tests while still missing requested functionality. For best results, changes should remain relatively small and manageable. Overall, M3's coding ability has crossed the usability threshold and clearly outperforms M2.7, though it remains behind Opus in efficiency, detail control, and overall engineering quality. 🧭 Final Thoughts There is no magic in the LLM race. M3 arrived only two months after M2.7, yet the progress is substantial. If M2.7 concentrated heavily on Agent capabilities, M3 appears to rebalance toward broader general intelligence. The key tradeoff is clear: 👉 Prioritize delivery quality first. 👉 Optimize efficiency later. That choice helped M3 close much of the capability gap—but its soaring token consumption may be the next challenge MiniMax has to solve. 📖 Full article: https://t.co/OlHDJgwJ71 #MiniMax #MiniMaxM3 #AI #LLM #AICoding #Agent #MultimodalAI

ZhihuFrontier's tweet photo. 🚀 MiniMax M3: Aiming for the Stars
Zhihu contributor toyama nao shares an early evaluation of MiniMax's new M3 multimodal model.

🔮 TL;DR
Back in April, GLM-5.1 pulled decisively ahead of MiniMax M2.7 and took the domestic coding crown. Two months later, MiniMax responds with M3.
The upgrade is significant: stronger reasoning, better stability, and much improved coding ability. M3 has firmly entered the "usable" tier.
⚖️ The cost? Efficiency.
Token consumption is up 77% versus M2.7—the highest among major models tested. Many medium-complexity tasks now consume 60K–70K tokens, making M3 substantially more expensive in practice.

🧠 Logic & Reasoning
Compared with DeepSeek V4 Flash, M3's strengths and weaknesses are clear.

✅ Strength 1: Long-context understanding
M3 shows excellent long-context hallucination control, reliably retrieving information from deep inside large documents.
On difficult retrieval-heavy tasks, it performs similarly to Qwen3.7-Max and ranks among the strongest domestic models.

✅ Strength 2: Complex reasoning
Long-chain reasoning is a major improvement over M2.7.
M3 has entered the top tier of Chinese models, solving problems through careful step-by-step exploration rather than relying on sudden insights.

⚠️ Weakness 1: Instruction following
M3 handles short and clear prompts extremely well, but performance becomes less predictable with long instructions and extended contexts.
As conversations grow longer, the model can suddenly lose track of earlier requirements.

⚠️ Weakness 2: Reasoning efficiency
M3 often consumes more tokens than DS4 Flash on comparable tasks.
Even medium-difficulty problems regularly exceed 30K tokens, with reasoning traces filled with repeated self-checks and verbose intermediate steps.

💻 Coding Performance
M3 is a substantial leap over M2.7, especially in frontend development and software engineering workflows.
Its coding behavior is highly structured: planning first, implementing module by module, testing continuously, and validating before delivery.

✅ Strength 1: Better architecture design
M3 is much stronger at choosing practical architectures that fit project requirements without overengineering.

✅ Strength 2: Strong self-testing
A large portion of M3's coding process is dedicated to self-debugging and validation. For complex issues, it can often locate problems efficiently on its own.

⚠️ Weakness 1: Expensive development cycles
Self-testing is also costly. A single task may require dozens of debugging rounds, with testing consuming more tokens than coding itself.

⚠️ Weakness 2: Requirement drift
As context grows, M3 can gradually forget parts of the original specification. The final output may pass tests while still missing requested functionality.
For best results, changes should remain relatively small and manageable.
Overall, M3's coding ability has crossed the usability threshold and clearly outperforms M2.7, though it remains behind Opus in efficiency, detail control, and overall engineering quality.

🧭 Final Thoughts
There is no magic in the LLM race.
M3 arrived only two months after M2.7, yet the progress is substantial. If M2.7 concentrated heavily on Agent capabilities, M3 appears to rebalance toward broader general intelligence.
The key tradeoff is clear:
👉 Prioritize delivery quality first.
👉 Optimize efficiency later.
That choice helped M3 close much of the capability gap—but its soaring token consumption may be the next challenge MiniMax has to solve.

📖 Full article:
https://t.co/OlHDJgwJ71
#MiniMax #MiniMaxM3 #AI #LLM #AICoding #Agent #MultimodalAI

3

74

7

11

7K

0

68

Tensor-Slayer

@TensorSlay

2 days ago

🤣🤣🤣🤣

TensorSlay's tweet photo. 🤣🤣🤣🤣 https://t.co/pgVVerDOWa

0

58

Tensor-Slayer

@TensorSlay

2 days ago

@kalomaze Were you …. ?

0

79

Tensor-Slayer

@TensorSlay

2 days ago

Is this going to be another one of those endless rare ai weeks

1

0

74

Tensor-Slayer

@TensorSlay

2 days ago

@teortaxesTex VERY SLOW

0

8

Tensor-Slayer

@TensorSlay

2 days ago

Thing I don’t agree*

0

87

Tensor-Slayer

@TensorSlay

2 days ago

It’s been a heavy day of usage of Minimax M3 in a very complex Hermes Agent fork deployed in a 20000 member discord group. 1. I feel below observation is spot on 2. But also it also is good enough for python, frontend, web search , market calls, handling multi-modality basically cover 80% that a trading related server would require. 3. the fact that surprises me is that the model tries to include recursive self improvement concept into anything it’s touching in the harness. 4. terminal tasks like Claude but has codex like (early 5.x series autism) 5. thing i agree about what most people are saying “model is not lazy”, not true in my case. Signal is messy. Will test more. Also it’s too slow !

Leo Linsky

@leo_linsky

2 days ago

Minimax M3 results are now live on GBENCH: It's a solid model, but the other Chinese labs with April releases had slightly better models. The main thing to worry about is benchmaxxing -- their model card was NOT accurate. Our evaluations are designed to resist this kind of overfitting.

leo_linsky's tweet photo. Minimax M3 results are now live on GBENCH:

It's a solid model, but the other Chinese labs with April releases had slightly better models.

The main thing to worry about is benchmaxxing -- their model card was NOT accurate.

Our evaluations are designed to resist this kind of overfitting.

15

132

6

30

24K

1

6

0

8

2K

Tensor-Slayer

@TensorSlay

2 days ago

Not enough political motivation yet. But desire is popping up in the discourse everywhere (eg Sanders’ recent take). Discourse should lead to formation of polarising camps leading political motivation.

will depue

@willdepue

2 days ago

do you think we see a large nationalized US government agi project in the next five years? or a chinese version, which probably forces our consolidation as well? why/why not?

40

117

2

22

15K

0

57

TensorSlay retweeted

Lisan al Gaib

@scaling01

3 days ago

the permanent underclass is already here, but you just haven't noticed it here's what it looks like: - frontier labs keep their best models to themselves for 1-3 months to make sure it's safe - then they sell the tokens to the US government and trillion dollar companies - after that allied countries get access - and only then do the poors get access to it after half a year of waiting. meanwhile they are already on Mythos 2 that is exponentially better

78

3K

114

583

341K

TensorSlay retweeted

JUMPERZ

@jumperz

3 days ago

Minimax m3 is wild and it broke the one rule every ai model has followed which is better costs = better capability... if you put every model on a graph, price on one side, how good it is on the other.. they all fall along a straight line.. cheap / weaker models sit bottom left and expensive / stronger ones sit top right.. you pay more, you get more simple as that.. picture a diagonal from cheap and weak (bottom left) to expensive and strong (top right).. that line is the going rate of how much capability your money buys... every model pays it... when m3 is the first to get more than it paid for, landing above the line where nothing has ever been.. it's as capable as the mid tier frontier models, but priced like the cheapest ones $1.20.. and the bigger part is that m3 is open weight so for the first time, the best value on the chart is also the one you fully own..

jumperz's tweet photo. Minimax m3 is wild and it broke the one rule every ai model has followed which is better costs = better capability...

if you put every model on a graph, price on one side, how good it is on the other.. they all fall along a straight line..

cheap / weaker models sit bottom left and expensive / stronger ones sit top right.. you pay more, you get more simple as that..

picture a diagonal from cheap and weak (bottom left) to expensive and strong (top right)..

that line is the going rate of how much capability your money buys... every model pays it... when m3 is the first to get more than it paid for, landing above the line where nothing has ever been..

it's as capable as the mid tier frontier models, but priced like the cheapest ones $1.20..

and the bigger part is that m3 is open weight so for the first time, the best value on the chart is also the one you fully own..

32

354

24

87

33K

TensorSlay retweeted

secemp

@secemp9

3 days ago

I love Pi agent, so much so I made a full 1:1 Python port and called it Harn (from Old Norse for "brain" and also to reference "harness") https://t.co/LJzuO8R5lp

5

59

6

17

4K

TensorSlay retweeted

stevibe

@stevibe

3 days ago

MiniMax M3 just dropped — their first natively multimodal model. So I ran it through my form-filling test. (The model has to place each element at the right pixel position on a blank form image, not type into a field.) Verdict: it got everything on the paper. > Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code, all there. > Best character spacing I've seen yet: it actually calculates the gap between each character, clean across the DOB and number boxes > A few fields slightly misaligned, but every piece of data made it onto the form The reasoning chain is the interesting part: it does the easy fields first, then works into the tight one-char-per-box fields, reasoning through y-coordinates, baselines, and label clearance in obsessive detail. The cost: 40:33 and 126.7k output tokens. That's a long think — but it's MiniMax's first multimodal model, and it nailed the content.