Julien

@Blogdufutur

Space, Tech, and Finance. Former Tech guy now in FIRE mode

State-as-a-Service

Joined July 2024

376 Following

256 Followers

4.7K Posts

Julien

@Blogdufutur

2 days ago

L'économiste Alan Greenspan est décédé à l'âge de 100 ans

Julien

@Blogdufutur

3 days ago

L'Etat nounou dans toute sa splendeur avec la #fetedelamusique et la #canicule... Pas d'alcool, pas de clim, couchez vous avant 22h

Blogdufutur retweeted

Brett Adcock

@adcock_brett

5 days ago

For the first time, robots now outnumber humans at Figure

229

369

321

296K

Blogdufutur retweeted

Secretary of War Pete Hegseth

@SecWar

6 days ago

NATO 3.0: Europe must take the lead.

653

759

157

210K

Blogdufutur retweeted

jietang

@jietang

7 days ago

（Claude、GPT、GLM） GLM-5.2 Tops Artificial Analysis as the #1 Open-Source Model, Ranking Top 3 Globally GLM-5.2 launched and went open-source today, delivering a solid scorecard across multiple authoritative third-party benchmarks and arenas. 📊 Artificial Analysis Intelligence Index A comprehensive evaluation that integrates several authoritative leaderboards spanning coding, reasoning, long context, and more. GLM-5.2 scored 51, ranking among the top of all available models—on par with Claude Opus 4.8—and claiming the #1 spot among open-source models worldwide. 🎨 Code Arena A real-world head-to-head arena focused on front-end code generation, with Elo rankings produced by blind user voting. GLM-5.2 ranked #2 globally with a score of 1,595. 🏆 DesignArena A category arena centered on scenarios that combine design and code. GLM-5.2 took the top spot with a score of 1,360. ⚙️ FrontierSWE A software-engineering benchmark built around the "frontier of human capability," assessing engineering ability across three dimensions: implementation, performance, and research. GLM-5.2 ranked #3 overall. 💪 From front-end development and design-to-code to engineering-grade software tasks, GLM-5.2 consistently lands in the top tier across multiple real-world evaluation scenarios, steadily closing in on the world's strongest models. We'll keep pushing forward in pursuit of an ever-higher ceiling of intelligence.

jietang's tweet photo. （Claude、GPT、GLM）
GLM-5.2 Tops Artificial Analysis as the #1 Open-Source Model, Ranking Top 3 Globally
GLM-5.2 launched and went open-source today, delivering a solid scorecard across multiple authoritative third-party benchmarks and arenas.
📊 Artificial Analysis Intelligence Index
A comprehensive evaluation that integrates several authoritative leaderboards spanning coding, reasoning, long context, and more. GLM-5.2 scored 51, ranking among the top of all available models—on par with Claude Opus 4.8—and claiming the #1 spot among open-source models worldwide.

🎨 Code Arena
A real-world head-to-head arena focused on front-end code generation, with Elo rankings produced by blind user voting. GLM-5.2 ranked #2 globally with a score of 1,595.
🏆 DesignArena
A category arena centered on scenarios that combine design and code. GLM-5.2 took the top spot with a score of 1,360.

⚙️ FrontierSWE
A software-engineering benchmark built around the "frontier of human capability," assessing engineering ability across three dimensions: implementation, performance, and research. GLM-5.2 ranked #3 overall.
💪 From front-end development and design-to-code to engineering-grade software tasks, GLM-5.2 consistently lands in the top tier across multiple real-world evaluation scenarios, steadily closing in on the world's strongest models. We'll keep pushing forward in pursuit of an ever-higher ceiling of intelligence.

913

123

89K

Blogdufutur retweeted

GMI Cloud

@gmi_cloud

8 days ago

GLM 5.2 is the new open source king! We tested it against GLM 5.1, Kimi K2.7, and MiniMax M3 across three visual reasoning prompts: smoke dynamics, burning paper, and curtain tearing. GLM 5.2 got 3/3 correct ,the best result in our test. Kimi K2.7 got 2/3. Not the fastest model, but clearly strongest on visual physics reasoning!

785

131

39K

Blogdufutur retweeted

jietang

@jietang

8 days ago

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20% Pure Open: An MIT open-source license — no regional limits, technical access without borders Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging. The result is a long-context system that is not only wide in scope, but solid in execution: a practical substrate for sustained engineering work. This capability is reflected in GLM-5.2's performance on three long-horizon coding benchmarks. FrontierSWE measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. On this benchmark, GLM-5.2 trails Opus 4.8 by only 1%, while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and evaluated by how much it can improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series. Across all three benchmarks, GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.

jietang's tweet photo. We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include:

Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work
Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency
Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20%
Pure Open: An MIT open-source license — no regional limits, technical access without borders
Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging. The result is a long-context system that is not only wide in scope, but solid in execution: a practical substrate for sustained engineering work.

This capability is reflected in GLM-5.2's performance on three long-horizon coding benchmarks. FrontierSWE measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. On this benchmark, GLM-5.2 trails Opus 4.8 by only 1%, while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and evaluated by how much it can improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series. Across all three benchmarks, GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.

181

303

526

358K

Blogdufutur retweeted

Arena.ai

@arena

8 days ago

Exciting news: GLM-5.2 (Max) ranks #2 in Code Arena: Frontend, with +29pt over Claude Opus 4.7 (Thinking) and only behind Fable 5! GLM-5.2 is the best open model vs Kimi-K2.6 and Minimax-M3 by a large margin. - #2 React and #4 HTML sub-leaderboards - Ranks as the top model in nearly all sub categories: Brand & Marketing, Reference-Based Design, Data & Analytics, Consumer Product, Gaming, and Simulations. Congrats @Zai_org for the incredible milestone!

183

538

Blogdufutur retweeted

Z.ai @Zai_org

8 days ago

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: https://t.co/LAsxUdN0JZ Weights: https://t.co/g0A1C4UWx4 API: https://t.co/Kc3E22cbN7 Coding Plan: https://t.co/Nk8Y98HNhU Chat: https://t.co/WCqWT0qCQb

Zai_org's tweet photo. Introducing GLM-5.2: Frontier Intelligence, Open Weights

- Significant improvements in coding and agentic tasks
- Strong long-horizon capabilities with a 1M context window
- Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency
- MIT-licensed open weights
- Same API pricing as GLM-5.1

Tech Blog: https://t.co/LAsxUdN0JZ
Weights: https://t.co/g0A1C4UWx4
API: https://t.co/Kc3E22cbN7
Coding Plan: https://t.co/Nk8Y98HNhU
Chat: https://t.co/WCqWT0qCQb

673

12K

Blogdufutur retweeted

Artificial Analysis

@ArtificialAnlys

7 days ago

Z ai’s GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index scoring 51 and it sits on the Pareto frontier of Intelligence vs Cost per Task @Zai_org’s GLM-5.2 is the same size as GLM-5.1 (744B total / 40B active parameters) but scores 11 points higher on the Intelligence Index v4.1, placing ahead of MiniMax-M3 (44) and DeepSeek V4 Pro (max, 44). On the first-party API it is priced in line with GLM-5.1 at $1.4/$4.4/$0.26 per 1M input/output/cache hit tokens Key results: ➤ GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43) ➤ Improvements across most evaluations, particularly scientific reasoning: GLM-5.2 gains over GLM-5.1 on most evaluations, led by scientific reasoning on CritPt (+16 points to 21%) and HLE (+12 points to 40%), alongside AA-LCR (+9 points to 71%), tau3 banking (+15 points to 27%) and SciCode (+7 points to 50%). TerminalBench v2.1 also improves (+16 points to 78%) and GPQA Diamond gains 3 points to 89% ➤ Leading open weights model on GDPval-AA v2 and competitive with proprietary models: GLM-5.2 scores 1524 on GDPval-AA v2, ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328). This impressive result places GLM-5.2 in-line with proprietary models including GPT-5.5 (xhigh reasoning). GDPval-AA v2 builds on the original GDPval-AA by baselining Elo to human performance at 1000, introducing a rotating panel of frontier-model judges, and raising the turn limit from 100 to 250 for longer-horizon agent trajectories ➤ GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k) ➤ On the Intelligence vs. Cost per Task Pareto Frontier: GLM-5.2 is on the Pareto frontier of the Intelligence vs Cost per Task chart, with the lowest cost per task among models at its intelligence level. GLM-5.2 costs ~$0.46 per task, compared to GLM-5.1 ($0.25), Kimi K2.6 ($0.31), MiniMax-M3 ($0.18) and DeepSeek V4 Pro (max, $0.05) Additional Model Details: ➤ License: MIT ➤ Size: 744B total parameters, 40B active parameters, equivalent to GLM-5.1 ➤ Context window: 1M tokens, up from 200K on GLM-5.1 ➤ Pricing: $1.4/$0.26/$4.4 per 1M input/cache hit/output tokens ➤ Availability: Alongside Z ai's first-party API, GLM-5.2 is available across third-party providers including @DeepInfra, @novita_labs, @nebiusai, @parasailnetwork , @SiliconFlowAI , @gmi_cloud , @Baseten and @FireworksAI_HQ

ArtificialAnlys's tweet photo. Z ai’s GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index scoring 51 and it sits on the Pareto frontier of Intelligence vs Cost per Task

@Zai_org’s GLM-5.2 is the same size as GLM-5.1 (744B total / 40B active parameters) but scores 11 points higher on the Intelligence Index v4.1, placing ahead of MiniMax-M3 (44) and DeepSeek V4 Pro (max, 44). On the first-party API it is priced in line with GLM-5.1 at $1.4/$4.4/$0.26 per 1M input/output/cache hit tokens

Key results:

➤ GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43)

➤ Improvements across most evaluations, particularly scientific reasoning: GLM-5.2 gains over GLM-5.1 on most evaluations, led by scientific reasoning on CritPt (+16 points to 21%) and HLE (+12 points to 40%), alongside AA-LCR (+9 points to 71%), tau3 banking (+15 points to 27%) and SciCode (+7 points to 50%). TerminalBench v2.1 also improves (+16 points to 78%) and GPQA Diamond gains 3 points to 89%

➤ Leading open weights model on GDPval-AA v2 and competitive with proprietary models: GLM-5.2 scores 1524 on GDPval-AA v2, ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328). This impressive result places GLM-5.2 in-line with proprietary models including GPT-5.5 (xhigh reasoning). GDPval-AA v2 builds on the original GDPval-AA by baselining Elo to human performance at 1000, introducing a rotating panel of frontier-model judges, and raising the turn limit from 100 to 250 for longer-horizon agent trajectories

➤ GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k)

➤ On the Intelligence vs. Cost per Task Pareto Frontier: GLM-5.2 is on the Pareto frontier of the Intelligence vs Cost per Task chart, with the lowest cost per task among models at its intelligence level. GLM-5.2 costs ~$0.46 per task, compared to GLM-5.1 ($0.25), Kimi K2.6 ($0.31), MiniMax-M3 ($0.18) and DeepSeek V4 Pro (max, $0.05)

Additional Model Details:

➤ License: MIT

➤ Size: 744B total parameters, 40B active parameters, equivalent to GLM-5.1

➤ Context window: 1M tokens, up from 200K on GLM-5.1

➤ Pricing: $1.4/$0.26/$4.4 per 1M input/cache hit/output tokens

➤ Availability: Alongside Z ai's first-party API, GLM-5.2 is available across third-party providers including @DeepInfra, @novita_labs, @nebiusai, @parasailnetwork , @SiliconFlowAI , @gmi_cloud , @Baseten and @FireworksAI_HQ

248

310

336K

Julien

@Blogdufutur

8 days ago

@arthurmensch Hi Arthur, thanks for the update! The new 'fat but sparse' model family sounds promising. Do you have any plans for video understanding capabilities in future releases or the upcoming models? Video multimodal feels like the next big frontier after images.

Blogdufutur retweeted

Cursor @cursor_ai

8 days ago

We're excited to join forces with @SpaceX to advance the frontier of useful AI. Expect significant improvements to Cursor soon.

30K

Blogdufutur retweeted

SpaceX

@SpaceX

8 days ago

SpaceX has exercised the option to acquire @cursor_ai in an all-stock transaction with the goal of building the world’s most useful AI models. For the past few months, SpaceXAI has been jointly training a model with Cursor, which will be released in Cursor and Grok Build soon. We look forward to working closely with the Cursor team to advance our frontier AI capabilities

37K

26M

Julien

@Blogdufutur

8 days ago

Elon Musk est maintenant plus riche que l'Arabie Saoudite (en PIB ) ...

Julien

@Blogdufutur

10 days ago

Inside the whirlwind 24 hours that led the White House to slap export controls on Anthropic https://t.co/FRvUPj281V

Blogdufutur retweeted

Reuters

@Reuters

10 days ago

Anthropic staff to meet White House officials next week, Axios reports https://t.co/8Q6Itg7vaG https://t.co/8Q6Itg7vaG

311

65K

Blogdufutur retweeted

Financial Times

@FT

10 days ago

Starmer to announce Australia-style social media ban for teenagers https://t.co/hhcpPqhMlC

24K

Julien

@Blogdufutur

10 days ago

Le CEO de Microsoft explique très bien pourquoi l'Europe n'a aucune chance de devenir "souveraine" sur l'Intelligence Artificielle ... Il décrit le vrai jeu : ce n’est plus une course aux modèles frontier (même si c’est important). C’est une course à la boucle d’apprentissage propriétaire (learning loop) : transformer les données métiers, les workflows, le jugement humain et les traces internes en “token capital” qui s’améliore tout seul.

Satya Nadella

@satyanadella

10 days ago

https://t.co/vLmiBKTtX3

41K

57K

66M

Julien

@Blogdufutur

10 days ago

@siliconcarnesf On n'a rien de souverain ... Pour l'entrainement ? Le GPU ? Le CPU ? La mémoire ? Même ASML est basé sur de la techno américaine. Et surtout on n'a pas le capital pour investir car tout est placé dans l'immobilier, le livret A et l'assurance-vie ...

Julien

@Blogdufutur

10 days ago

Refinitiv citant de hauts responsables iraniens indique qu'un projet de mémorandum d'entente prévoyait que l'Iran s'engage à ne pas développer ni acquérir d'armes nucléaires et à maintenir le statu quo nucléaire jusqu'à la conclusion d'un accord définitif. L'Iran diluerait son stock d'uranium hautement enrichi ; les modalités de cette dilution seraient discutées dans un délai de 60 jours. Les États-Unis accorderaient des dérogations temporaires aux sanctions pétrolières pour des périodes déterminées, permettant ainsi les ventes de pétrole iranien et les rentrées de revenus. L'Iran rouvrirait immédiatement le détroit d'Ormuz à toute navigation commerciale et les États-Unis lèveraient leur blocus maritime. Les États-Unis débloqueraient 25 milliards de dollars d'avoirs iraniens par le biais de transferts de fonds directs, de la coopération régionale et de lignes de crédit. Selon ce projet, les États-Unis n'imposeraient pas de nouvelles sanctions à l'Iran avant la conclusion d'un accord définitif.

Julien

@Blogdufutur

Last Seen Users on Sotwe

Trends for you

Most Popular Users