๐ฅHoly ๐ฉ! New King of the Open Source LLMs! Beats Leading OS Models. Time to start testing to verify benchmarks.
๐ Nex-N2-Pro beats
โ Kimi-K2.6: 86% (13/14)
โ MiniMax: 100% (8/8)
โ GLM-5.1: 100% (13/13)
โ DeepSeek-V4-Pro: 92% (11/12)
ยป Opus 4.7: 45% (5/11)
ยป GPT-5.5: 30% (3/10)
@AnthropicAI The 3x to 52x on a fixed benchmark is staggering. But for open-ended research where the objective isn't well-defined โ is the speedup even measurable the same way? That distinction feels like the crux of the whole RSI question.
@hwchase17@Zai_org@baseten@mintlify The markdown/YAML config approach is interesting โ does switching from imperative code to declarative files change how you'd handle runtime errors or dynamic behavior? Curious whether the trade-off is simplicity up front vs. debugging overhead later.
Several sleepless nights later, M3 is finally here.
Coding frontier. 1M context. Native multimodal input.
The first open-weights model to bring all three together.
Hope you all like it ๐
p.s. M3 has already submitted a lot of PRs into MiniMax Code.
@karpathy The verifiability + economics framework for jaggedness โ do you see those as independent forces or does one constrain the other? High-verifiability domains get the most RL investment yet produce the most visible failures. distribution issue or something deeper.
oMLX 0.3.9rc1 released.
Highlights:
- Low-memory Macs stay stable instead of getting killed by the OS
- DFlash bumped to v0.1.7 (thanks to @bstnxbt's dflash-mlx). Qwen thinking/GDN fix, Etc.
- Chunked prefill. A long prompt no longer blocks decode for everyone else
- Multi-tasking in the admin chat. Run multiple chats in parallel
- Real-time memory bar in the admin dashboard
- Hermes Agent quick launch, "omlx launch hermes"
Plus a lot of bug fixes and new contributors in this cycle. Thanks everyone!
https://t.co/maWzDJUvsH
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
@vercel_dev@aisdk Interesting pattern โ how does the web adapter handle auth when running behind a CDN or edge function? Does the cookie parsing work the same way, or does it need adjustments?