(Claude、GPT、GLM)
GLM-5.2 Tops Artificial Analysis as the #1 Open-Source Model, Ranking Top 3 Globally
GLM-5.2 launched and went open-source today, delivering a solid scorecard across multiple authoritative third-party benchmarks and arenas.
📊 Artificial Analysis Intelligence Index
A comprehensive evaluation that integrates several authoritative leaderboards spanning coding, reasoning, long context, and more. GLM-5.2 scored 51, ranking among the top of all available models—on par with Claude Opus 4.8—and claiming the #1 spot among open-source models worldwide.
🎨 Code Arena
A real-world head-to-head arena focused on front-end code generation, with Elo rankings produced by blind user voting. GLM-5.2 ranked #2 globally with a score of 1,595.
🏆 DesignArena
A category arena centered on scenarios that combine design and code. GLM-5.2 took the top spot with a score of 1,360.
⚙️ FrontierSWE
A software-engineering benchmark built around the "frontier of human capability," assessing engineering ability across three dimensions: implementation, performance, and research. GLM-5.2 ranked #3 overall.
💪 From front-end development and design-to-code to engineering-grade software tasks, GLM-5.2 consistently lands in the top tier across multiple real-world evaluation scenarios, steadily closing in on the world's strongest models. We'll keep pushing forward in pursuit of an ever-higher ceiling of intelligence.
GLM 5.2 is the new open source king!
We tested it against GLM 5.1, Kimi K2.7, and MiniMax M3 across three visual reasoning prompts: smoke dynamics, burning paper, and curtain tearing.
GLM 5.2 got 3/3 correct ,the best result in our test. Kimi K2.7 got 2/3.
Not the fastest model, but clearly strongest on visual physics reasoning!
We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include:
Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work
Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency
Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20%
Pure Open: An MIT open-source license — no regional limits, technical access without borders
Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging. The result is a long-context system that is not only wide in scope, but solid in execution: a practical substrate for sustained engineering work.
This capability is reflected in GLM-5.2's performance on three long-horizon coding benchmarks. FrontierSWE measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. On this benchmark, GLM-5.2 trails Opus 4.8 by only 1%, while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and evaluated by how much it can improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series. Across all three benchmarks, GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.
Exciting news: GLM-5.2 (Max) ranks #2 in Code Arena: Frontend, with +29pt over Claude Opus 4.7 (Thinking) and only behind Fable 5! GLM-5.2 is the best open model vs Kimi-K2.6 and Minimax-M3 by a large margin.
- #2 React and #4 HTML sub-leaderboards
- Ranks as the top model in nearly all sub categories: Brand & Marketing, Reference-Based Design, Data & Analytics, Consumer Product, Gaming, and Simulations.
Congrats @Zai_org for the incredible milestone!
Introducing GLM-5.2: Frontier Intelligence, Open Weights
- Significant improvements in coding and agentic tasks
- Strong long-horizon capabilities with a 1M context window
- Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency
- MIT-licensed open weights
- Same API pricing as GLM-5.1
Tech Blog: https://t.co/LAsxUdN0JZ
Weights: https://t.co/g0A1C4UWx4
API: https://t.co/Kc3E22cbN7
Coding Plan: https://t.co/Nk8Y98HNhU
Chat: https://t.co/WCqWT0qCQb
Z ai’s GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index scoring 51 and it sits on the Pareto frontier of Intelligence vs Cost per Task
@Zai_org’s GLM-5.2 is the same size as GLM-5.1 (744B total / 40B active parameters) but scores 11 points higher on the Intelligence Index v4.1, placing ahead of MiniMax-M3 (44) and DeepSeek V4 Pro (max, 44). On the first-party API it is priced in line with GLM-5.1 at $1.4/$4.4/$0.26 per 1M input/output/cache hit tokens
Key results:
➤ GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43)
➤ Improvements across most evaluations, particularly scientific reasoning: GLM-5.2 gains over GLM-5.1 on most evaluations, led by scientific reasoning on CritPt (+16 points to 21%) and HLE (+12 points to 40%), alongside AA-LCR (+9 points to 71%), tau3 banking (+15 points to 27%) and SciCode (+7 points to 50%). TerminalBench v2.1 also improves (+16 points to 78%) and GPQA Diamond gains 3 points to 89%
➤ Leading open weights model on GDPval-AA v2 and competitive with proprietary models: GLM-5.2 scores 1524 on GDPval-AA v2, ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328). This impressive result places GLM-5.2 in-line with proprietary models including GPT-5.5 (xhigh reasoning). GDPval-AA v2 builds on the original GDPval-AA by baselining Elo to human performance at 1000, introducing a rotating panel of frontier-model judges, and raising the turn limit from 100 to 250 for longer-horizon agent trajectories
➤ GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k)
➤ On the Intelligence vs. Cost per Task Pareto Frontier: GLM-5.2 is on the Pareto frontier of the Intelligence vs Cost per Task chart, with the lowest cost per task among models at its intelligence level. GLM-5.2 costs ~$0.46 per task, compared to GLM-5.1 ($0.25), Kimi K2.6 ($0.31), MiniMax-M3 ($0.18) and DeepSeek V4 Pro (max, $0.05)
Additional Model Details:
➤ License: MIT
➤ Size: 744B total parameters, 40B active parameters, equivalent to GLM-5.1
➤ Context window: 1M tokens, up from 200K on GLM-5.1
➤ Pricing: $1.4/$0.26/$4.4 per 1M input/cache hit/output tokens
➤ Availability: Alongside Z ai's first-party API, GLM-5.2 is available across third-party providers including @DeepInfra, @novita_labs, @nebiusai, @parasailnetwork , @SiliconFlowAI , @gmi_cloud , @Baseten and @FireworksAI_HQ
@arthurmensch Hi Arthur, thanks for the update! The new 'fat but sparse' model family sounds promising. Do you have any plans for video understanding capabilities in future releases or the upcoming models? Video multimodal feels like the next big frontier after images.
SpaceX has exercised the option to acquire @cursor_ai in an all-stock transaction with the goal of building the world’s most useful AI models.
For the past few months, SpaceXAI has been jointly training a model with Cursor, which will be released in Cursor and Grok Build soon.
We look forward to working closely with the Cursor team to advance our frontier AI capabilities
Le CEO de Microsoft explique très bien pourquoi l'Europe n'a aucune chance de devenir "souveraine" sur l'Intelligence Artificielle ...
Il décrit le vrai jeu : ce n’est plus une course aux modèles frontier (même si c’est important). C’est une course à la boucle d’apprentissage propriétaire (learning loop) : transformer les données métiers, les workflows, le jugement humain et les traces internes en “token capital” qui s’améliore tout seul.
@siliconcarnesf On n'a rien de souverain ... Pour l'entrainement ? Le GPU ? Le CPU ? La mémoire ? Même ASML est basé sur de la techno américaine. Et surtout on n'a pas le capital pour investir car tout est placé dans l'immobilier, le livret A et l'assurance-vie ...
Refinitiv citant de hauts responsables iraniens indique qu'un projet de mémorandum d'entente prévoyait que l'Iran s'engage à ne pas développer ni acquérir d'armes nucléaires et à maintenir le statu quo nucléaire jusqu'à la conclusion d'un accord définitif.
L'Iran diluerait son stock d'uranium hautement enrichi ; les modalités de cette dilution seraient discutées dans un délai de 60 jours.
Les États-Unis accorderaient des dérogations temporaires aux sanctions pétrolières pour des périodes déterminées, permettant ainsi les ventes de pétrole iranien et les rentrées de revenus.
L'Iran rouvrirait immédiatement le détroit d'Ormuz à toute navigation commerciale et les États-Unis lèveraient leur blocus maritime.
Les États-Unis débloqueraient 25 milliards de dollars d'avoirs iraniens par le biais de transferts de fonds directs, de la coopération régionale et de lignes de crédit.
Selon ce projet, les États-Unis n'imposeraient pas de nouvelles sanctions à l'Iran avant la conclusion d'un accord définitif.