Xun Liu | KuAi @opsAi_nice9 - Twitter Profile

A bit of news: After nearly 9 years, I have decided to leave Google DeepMind and join Anthropic (after taking some time to recharge). I am incredibly grateful for my time at GDM. @demishassabis took a real chance letting me lead the AlphaFold team just six months after finishing my PhD, and the entire GDM team taught me so much about how to do great science. GDM is a special place, and I’ll still be excited to hear about what amazing things they discover next.

605

14K

960

2K

6M

0

18

Xun Liu | KuAi

@opsAi_nice9

3 days ago

控制欲这件事情完全因人而异吧。我就觉得北方女人控制欲比川渝的低

Xiangyu 香鱼🐬

@xiangyuli

3 days ago

我的色批同事有一天和我说：北方女人，不管好不好看，性格又刚猛，控制欲又强所以一定要找个南方人我回头看看，还真他妈的是 🤪🤪🤪🤪 每天自己做个饭一定要放个短剧你看点东西就骂的不行啥事儿都得围着她转这些女人就应该都找一个大自己十几岁的小老头不然每天家里就她横行霸道，就她最对

20

71

3

27

54K

1

3

0

3K

Xun Liu | KuAi

@opsAi_nice9

4 days ago

看着最近雷军拙劣的模仿，想起对copy 2china的历史记忆。感觉雷军的公关团队是旧时代遗留的那一滴眼泪。

0

17

Xun Liu | KuAi

@opsAi_nice9

5 days ago

破案了

Tibo

@thsottiaux

5 days ago

Dearest gentle codexer. We did a sneaky double reset. Not only do you get a full reset on us. But you are also getting one into the reset bank to use at your own leisure. Enjoy

963

7K

333

353

546K

0

14

Xun Liu | KuAi

@opsAi_nice9

5 days ago

pro用户的重置可以这么随便吗？我感觉几天送一次了。看用量

0

12

Xun Liu | KuAi

@opsAi_nice9

5 days ago

wow

jietang

@jietang

5 days ago

（Claude、GPT、GLM） GLM-5.2 Tops Artificial Analysis as the #1 Open-Source Model, Ranking Top 3 Globally GLM-5.2 launched and went open-source today, delivering a solid scorecard across multiple authoritative third-party benchmarks and arenas. 📊 Artificial Analysis Intelligence Index A comprehensive evaluation that integrates several authoritative leaderboards spanning coding, reasoning, long context, and more. GLM-5.2 scored 51, ranking among the top of all available models—on par with Claude Opus 4.8—and claiming the #1 spot among open-source models worldwide. 🎨 Code Arena A real-world head-to-head arena focused on front-end code generation, with Elo rankings produced by blind user voting. GLM-5.2 ranked #2 globally with a score of 1,595. 🏆 DesignArena A category arena centered on scenarios that combine design and code. GLM-5.2 took the top spot with a score of 1,360. ⚙️ FrontierSWE A software-engineering benchmark built around the "frontier of human capability," assessing engineering ability across three dimensions: implementation, performance, and research. GLM-5.2 ranked #3 overall. 💪 From front-end development and design-to-code to engineering-grade software tasks, GLM-5.2 consistently lands in the top tier across multiple real-world evaluation scenarios, steadily closing in on the world's strongest models. We'll keep pushing forward in pursuit of an ever-higher ceiling of intelligence.

jietang's tweet photo. （Claude、GPT、GLM）
GLM-5.2 Tops Artificial Analysis as the #1 Open-Source Model, Ranking Top 3 Globally
GLM-5.2 launched and went open-source today, delivering a solid scorecard across multiple authoritative third-party benchmarks and arenas.
📊 Artificial Analysis Intelligence Index
A comprehensive evaluation that integrates several authoritative leaderboards spanning coding, reasoning, long context, and more. GLM-5.2 scored 51, ranking among the top of all available models—on par with Claude Opus 4.8—and claiming the #1 spot among open-source models worldwide.

🎨 Code Arena
A real-world head-to-head arena focused on front-end code generation, with Elo rankings produced by blind user voting. GLM-5.2 ranked #2 globally with a score of 1,595.
🏆 DesignArena
A category arena centered on scenarios that combine design and code. GLM-5.2 took the top spot with a score of 1,360.

⚙️ FrontierSWE
A software-engineering benchmark built around the "frontier of human capability," assessing engineering ability across three dimensions: implementation, performance, and research. GLM-5.2 ranked #3 overall.
💪 From front-end development and design-to-code to engineering-grade software tasks, GLM-5.2 consistently lands in the top tier across multiple real-world evaluation scenarios, steadily closing in on the world's strongest models. We'll keep pushing forward in pursuit of an ever-higher ceiling of intelligence.

74

903

82

123

87K

0

30

Xun Liu | KuAi

@opsAi_nice9

6 days ago

！

jietang

@jietang

6 days ago

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20% Pure Open: An MIT open-source license — no regional limits, technical access without borders Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging. The result is a long-context system that is not only wide in scope, but solid in execution: a practical substrate for sustained engineering work. This capability is reflected in GLM-5.2's performance on three long-horizon coding benchmarks. FrontierSWE measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. On this benchmark, GLM-5.2 trails Opus 4.8 by only 1%, while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and evaluated by how much it can improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series. Across all three benchmarks, GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.

jietang's tweet photo. We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include:

Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work
Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency
Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20%
Pure Open: An MIT open-source license — no regional limits, technical access without borders
Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging. The result is a long-context system that is not only wide in scope, but solid in execution: a practical substrate for sustained engineering work.

This capability is reflected in GLM-5.2's performance on three long-horizon coding benchmarks. FrontierSWE measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. On this benchmark, GLM-5.2 trails Opus 4.8 by only 1%, while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and evaluated by how much it can improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series. Across all three benchmarks, GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.

181

4K

303

524

355K

0

33

opsAi_nice9 retweeted

jietang

@jietang

6 days ago

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20% Pure Open: An MIT open-source license — no regional limits, technical access without borders Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging. The result is a long-context system that is not only wide in scope, but solid in execution: a practical substrate for sustained engineering work. This capability is reflected in GLM-5.2's performance on three long-horizon coding benchmarks. FrontierSWE measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. On this benchmark, GLM-5.2 trails Opus 4.8 by only 1%, while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and evaluated by how much it can improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series. Across all three benchmarks, GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.

181

4K

303

524

355K

Xun Liu | KuAi

@opsAi_nice9

7 days ago

AI界目前真的是，天上1天，人间10年。你以为模型能力没达到你的预期，其实才过去了1个月

0

7

Xun Liu | KuAi

@opsAi_nice9

8 days ago

@raycat2021 这个研究本质上是很严谨的，把伪姓氏跟血缘都有说明。这只是一个社会性的表达，并不代表，600年前的家族普遍还在高位

0

1

0

303

Xun Liu | KuAi

@opsAi_nice9

8 days ago

今天顶级企业的发展速度历史上从未出现过的。从这个角度来看，类似阿迪王这样的错误也就变得不是那么不值得的原谅了

0

1

0

11

Xun Liu | KuAi

@opsAi_nice9

8 days ago

Interesting.

Satya Nadella

@satyanadella

8 days ago

https://t.co/vLmiBKTtX3

3K

41K

8K

56K

66M

0

17

opsAi_nice9 retweeted

郭宇 guoyu.eth

@turingou

8 days ago

AI Native 公司本质上是在重新定义每一个领域的工作。

9

139

18

109

63K

Xun Liu | KuAi

@opsAi_nice9

9 days ago

阿迪王会为他的极端思维付出代价。短期超越oai而已

0

7

Xun Liu | KuAi

@opsAi_nice9

9 days ago

哈哈

Yuchen Jin

@Yuchenj_UW

9 days ago

One hypothesis: If non-citizens at Anthropic can’t work on Mythos/Fable, and LLM jailbreaks remain unsolved, US frontier labs will be forced to slow down training and model releases. Could Chinese open-source AI surpass US closed models for the first time in ~6 months?

59

363

15

23

28K

0

1

0

33

Xun Liu | KuAi

@opsAi_nice9

Last Seen Users on Sotwe

Trends for you

Most Popular Users