Crazy model! It actually uses the old Qwen2.5-Coder-3B stack and got really great performance with their post-training stack.
Need to use it in the next days to see if vibes of VibeCoder actually check out in practice. But impressive first impression!
Based on the tech report, some of the important pieces of their post-training stack:
1. High-signal synthetic data (math problems with credible solutions, code with tests)
2. Multiple reasoning paths for each answer
3. Filtering, filtering, filtering
4. 2-stage SFT (start with broad training, then train on hard long-reasoning samples)
5. Use target (pass@k) accuracy over validation loss for checkpoint selection
6. MGPO (MaxEnt-Guided Policy Optimization) for RLVR: basically a GRPO-style RL method with an extra weighting that favors examples that are neither too easy nor too hard for the current policy
7. Single 64k long-context RL (they found that the usual progressive context expansion hurt this model because early truncation damaged long-thinking behavior)
8. Training data order: they do Math RL, then Code RL, then STEM RL in this particular oder which they found helped overall
9. After optimizing for accuracy, they add a stage that rewards shorter correct trajectories; basically making the model more efficient without accuracy degradation
Introducing GLM-5.2: Frontier Intelligence, Open Weights
- Significant improvements in coding and agentic tasks
- Strong long-horizon capabilities with a 1M context window
- Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency
- MIT-licensed open weights
- Same API pricing as GLM-5.1
Tech Blog: https://t.co/LAsxUdN0JZ
Weights: https://t.co/g0A1C4UWx4
API: https://t.co/Kc3E22cbN7
Coding Plan: https://t.co/Nk8Y98HNhU
Chat: https://t.co/WCqWT0qCQb
THEY RELEASED THE PAPER! This is big. Mistral AI just released a full paper on exactly how they pulled off the creation of Le Chaton Fat.
Is this Mistral's Deepseek moment?
In light of what happened, I'm doubling down on skills like /improve.
A frontier model got pulled. If it happened once, it's gonna happen again. Fable today. 4.9 tomorrow or maybe gpt 6 one day.
So, treat intelligence as borrowed. Drain intelligence when it's available. Build a catalog of plans today. Then implement later with a cheaper, open source, or a model you control.
Build the backlog now.
https://t.co/rqHw0fPv4G
Prepare for takeoff. ✈️ Flight simulator is now available globally on web to all users. https://t.co/hQP0No142P
We've recently added many our most powerful professional desktop features to web. Elevation profiles, new import types, but there's always been one other feature you've been asking us to add to the web version of Google Earth, just for fun...
Where will you fly? Share your best maneuvers, views, and flyovers with us!
Tencent just open-sourced Hy-Memory.
A memory plugin that gives Al agents real long-term memory using a 6-layer framework with dual reasoning.
→ System1: fast pattern matching for instant recall
→ System2: deep reasoning for complex memory retrieval
→ 35% reduction in token usage
→ 70% less memory bloat over time
Most agents forget everything between sessions. This fixes that. Works for long-running collaborative Al agents that need persistent context.
100% Open Source.
MiniMax just updated the M3 license!
Non-commercial use is now fully free.
For commercial use:
- Individuals or companies under $20M annual revenue only need to email them at [email protected] and add a “Build with MiniMax” label.
- Only larger companies need to contact them for a full commercial license.
This is actually pretty generous.
You have Claude Fable for only a few days. Here's how to make the most of it.
Introducing /improve: use your most capable model to audit your codebase and write plans for cheaper models to execute later.
Studies your code, figures out bugs, perf, tech debt, missing tests, what to build and writes plans any agent can run.
i hooked my whoop to my work calendar to find which coworker gives me the most stress 🚨
thanks to fable, I reverse engineered whoop to pull per minute heart rate. nd matched spikes with cal events and attendees
I now have a leaderboard and I think about it daily.
few info masked for obvious reasons ;)
Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings.
Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened.
But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)
Stanford + Meta just dropped the paper that flips everything about AI agents.
It's called "Code as Agent Harness."
Right now, we treat large language models as text generators. When they need to solve a complex problem, they rely on a "chain of thought."
But natural language is slippery. It's vague. It loses context. When an agent hallucinates in English, it just keeps talking.
So they introduced a framework that changes the entire architecture of autonomy: "Code as Agent Harness."
They stopped asking the AI to reason in words, and forced it to reason in code.
Code isn't just the final output anymore. It is the memory. It is the environment. It is the boundary.
Instead of writing a paragraph about how to solve a problem, the agent writes a script, executes it, and reads the output.
Tests become its senses. Execution logs become its memory. Sandboxes become its physics.
If an agent makes a mistake in English, it apologizes and hallucinates again.
If an agent makes a mistake in code, the compiler throws an error. The trace tells it exactly what broke. The system forces it to fix it.
This is where prompt engineering dies, and systems engineering takes over.
The paper proves that reliability doesn't come from a smarter base model. It comes from the "harness" wrapped around it:
- The model proposes.
- The harness executes.
- The environment returns feedback.
- The verifier checks.
📢 Nex-N2 is here!
A family of agentic models that doesn't just think, it acts!
Coding, search, tool use. All fused into a single agentic reasoning loop.
- Adaptive Thinking, auto-scales reasoning depth per step. Saves ~20% tokens, zero performance loss.
- Coherent Thinking, one thinking paradigm across search, coding, and tool use. No more fragile mode-switching.
🏆 Result: Tier-1 open-source performance on SWE-bench, Terminal-Bench, GDPval, and more, tracking GPT-5.5 and Opus 4.7.
🎉 Open-weight. Try it now.
🔗 https://t.co/7oLSfyOCxB
📦 https://t.co/c2CGhXWaz6
https://t.co/KJYXZIpk8M
https://t.co/vcjdZ9cuB6
Most important software in my stack:
#1 Tailscale
- SSH into all your devices securely
- Host websites for use in your home network
- Easy high quality VPN
#2 VibeProxy
- Connect all your AI subscriptions
- Single API which will connect to any agent
#3 Codex App
- Fire
Here’s GLM-5.1 in Codex app
Do you want to use all your AI subs, local models, and Claude in Codex app?
- compaction working
- codex plugins workings
- cache optimised
- simple setup
Thanks to onlyterp for making this possible
https://t.co/Qr9VgZajHK