GLM-5.2 (Max) by @Zai_org ranks #10 on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1 open model by a wide margin!
In Agent Arena, we measure models on millions of real-world, long-horizon agentic tasks from a global community of users. Models can access web search, filesystem, and terminal tools to complete complex workflows. The leaderboard measures model performance on outcomes relative to the average model using a causal tracing methodology.
Compared to 5.1, GLM-5.2 (Max) climbs from #13 to #10. Its clearest gains are confirmed task success, and user praise vs. complaint. Bash capabilities and tool hallucination remain stable. There is a tradeoff in steerability compared to the previous model (-6.0% vs. +1.2%).
GLM-5.2 remains the same price as GLM-5.1, $1.4/$4.4 per input/output MTokens. 1M context window.
Huge congrats @Zai_org for the incredible release!
See thread for details on how GLM-5.2 (Max) performs across 5 different signals.
OpenCode x Ring 2.6 1T - free for a limited time
256K context • reasoning • text only
Thanks to @AntLingAGI and @novita_labs for making the model available
We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! 🚀
Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing
Guide: https://t.co/nwvVfNC8XE
Qwen3.6 GGUF Evaluations
For the 27B:
Q2_K_XL is surprisingly recommendable.
IQ3_XXS performs very similarly, uses only +0.2 GB, and generates significantly fewer tokens. If you are memory-tight, pick this one.
Otherwise, if you can spare +2.5 GB, use Q3_K_XL: (almost) same accuracy and token efficiency as the original.
All the results, also for the 35B, here:
https://t.co/zbBCZ0Ty7a
More results are coming, probably Monday, covering other GGUF providers and some abliterated models.
Matrix-Game 2.0 — The FIRST open-source, real-time, long-sequence interactive world model
Last week, DeepMind's Genie 3 shook the AI world with real-time interactive world models.
But... it wasn't open-sourced.
Today, Matrix-Game 2.0 changed the game. 🚀
25FPS. Minutes-long interaction. Fully open-source.
One line of code is all it takes to fine-tune the gpt-oss models from @OpenAI 🔥
> Support to target the MoE expert layers with PEFT
> Kernels for FlashAttention3 & MegaBlocks
> Fast inference with MXFP4 quantization format
In our testing, these models are extremely efficient to tune and can be adapted to new domains with just a few 100 samples 🤯
Download the models: https://t.co/3cOIB3tGVt
Training & inference recipes: https://t.co/aQaDzUGHXR
So many of you are loving turning your photos into short videos in the @Geminiapp and the Gemini API. Next up, we’ll be rolling this feature out to @YouTube Shorts and @GooglePhotos. And soon, Remix your Google Photos into comics, sketches + 3D animations.
New Lens on RAG Systems
RAG systems are more brittle than you think, even when provided sufficient context.
Great work from Google and collaborators.
Good tips for devs included.
Here are my notes:
we wrote a guide on how to work with documentation in @cursor_ai
includes some guidance on when to use which tool, a quick MCP server example for internal docs, and some prompting tips
Mixture of Experts (MoE) is a popular architecture that uses different "experts" to improve Transformer models.
The visual below explains how they differ from Transformers.
Let's dive in to learn more about MoE!
OpenAI recently released a guide on building agents which contains some misguided takes
There's a lot of FUD, confusion, hype, and noise around agents
I wrote a blog on how to think about agent frameworks. Includes:
Background Info
- What is an agent?
- What is hard about building agents?
- What is LangGraph?
Flavors of agentic frameworks
- “Agents” vs “workflows”
- Declarative vs non-declarative
- Agent abstractions
- Multi agent
Common Questions
- What is the value of a framework?
- As the models get better, will everything become agents instead of workflows?
- What did OpenAI get wrong in their take?
- How do all the agent frameworks compare?
Today, we’re launching Orpheus, an open-source TTS model that exceeds the capabilities of both open and closed-source models such as ElevenLabs and OpenAI! (1/6)