Grok 4.5, based on our 1.5T V9 foundation model, with Cursor data added in supplemental training, is now in private beta at SpaceX & Tesla. Early evals show performance close to, perhaps exceeding Opus.
RL is continuing to significantly improve the model, and the Grok Build harness gets better every day.
Nice work by all those involved!
Completely trained from scratch new models will be released by @SpaceX every month this year.
Andrej Karpathy:
"The value of the system is in the edges, not the nodes."
In a short field note, he lays out 9 rules for a knowledge base that maintains itself. Claude keeps the notes.
Obsidian shows the graph.
3 layers, 3 owners. raw belongs to you and never gets edited. wiki belongs to the model. 1 schema file holds the rules for both.
This isn't RAG. RAG re-derives the answer every query and keeps nothing. Here your sources compile once into linked pages and compound.
You feed 1 source at a time. The model files it, links it, updates every neighbor it touches.
Start with 10 sources, not 10,000.
More useful than a $500 PKM course. Save this.
Claude + Obsidian stops being a folder. It becomes a second brain.
现在要用中国开源模型的 coding 套餐最佳的就是订阅 opencode Go 套餐,首月 $10 信用卡和支付宝可以支付。
也给不知道怎么选模型的朋友做个推荐以及各个模型的优势
第一梯队:GLM-5.2、Qwen3.7Max
第二梯队:Kimi K2.7code、minimax M3
第三梯队:MiMo-V2.5-pro、DeepSeek V4 Pro
下面一个个聊
1、GLM-5.2:编程能力最强但不支持多模态,能力上超过了 Claude Opus4.6搭配 Zcode 使用最佳。GLM 系列一般两个月更新一次。配合 Zcode 使用最佳,如果你用的是 GLM 的 coding 套餐 GLM-5-Turbo 是支持图片的。
2、Qwen3.7Max:编程能力次之,支持多模态,综合能力最强,不管是编程还是工作,小版本基本每月一更,能力提升很快。也可以配合 Qoder(编程)、和 QoderWork(办公),每天有 Qwen3.7Max 200 次免费调用,对轻度玩家足够用了。Qoder的客户端功能很强大,也支持其他模型接入,推荐使用。
3、Kimi K2.7code:编程对比Qwen3.7Max差距比较小,支持多模态,但上下文长度是 256k,客户端也没有Qoder好用,前端设计能力也不错。有 kimi work 和 kimi code 两个客户端。
4、minimax M3:支持多模态,任务完成能力挺强的并没有大家说的那么差,现在官网永久打5折,opencode 套餐上也是 3 倍用量,性价比最高。M3的主要问题是,规划能力比较差,可能需要多轮对话才可以完成任务,耗时也更长。同样有客户端 minimax Code 使用最佳。
5、MiMo-V2.5-pro、DeepSeek V4 Pro:纯文本,不支持多模态。两个能力差距不大,定价也差不多,是最便宜的模型。MiMo有mimocode 现在每天有免费的额度可以使用。两个的通用能力不错,虽然编程能力不及前面几个,适合用来干一些简单的固定的任务。
To get the best coding plan with Chinese open-source models, subscribe to the opencode Go plan—first month $10, payable by credit card or Alipay.
For those unsure which model to choose, here are my recommendations and each model's strengths.
Tier 1: GLM-5.2, Qwen3.7Max
Tier 2: Kimi K2.7code, minimax M3
Tier 3: MiMo-V2.5-pro, DeepSeek V4 Pro
Let's go through them one by one.
1. GLM-5.2: Best coding ability but no multimodal support. It surpasses Claude Opus 4.6 when paired with Zcode. The GLM series typically updates every two months. Works best with Zcode. If you use the GLM coding plan, GLM-5-Turbo does support images.
2. Qwen3.7Max: Second best for coding, supports multimodal, and has the strongest overall capability—whether for coding or general work. Minor versions update roughly monthly, with rapid capability gains. Combines well with Qoder (coding) and QoderWork (office). You get 200 free calls per day to Qwen3.7Max, enough for casual users. The Qoder client is powerful and also supports integration with other models—highly recommended.
3. Kimi K2.7code: The gap in programming capability compared to Qwen3.7Max is relatively small. It supports multimodal, but the context length is 256k, and the client is not as good as Qoder. Its front-end design capability is also good. There are two clients: kimi work and kimi code.
4. minimax M3: Supports multimodal. Its task completion ability is quite strong and not as poor as many people say. Currently, the official website offers a permanent 50% discount, and on the opencode plan, it provides 3x usage, giving it the highest cost-effectiveness. The main issue with M3 is that its planning ability is relatively weak, possibly requiring multiple rounds of conversation to complete tasks, and it takes longer. Likewise, using the minimax Code client is best.
5. MiMo-V2.5-pro, DeepSeek V4 Pro: Text-only, do not support multimodal. Their capabilities are not far apart, and pricing is similar, making them the cheapest models. MiMo has mimocode, which now offers free daily usage. Both have decent general capabilities, and although their programming ability is not on par with the previous ones, they are suitable for simple, fixed tasks.
WAIT. This is actually insane.
A senior dev dropped the SOUL .md template behind his Hermes Agent. Says he's never shared this before.
The sections that turn your agent from a chatbot into an autonomous operator:
→ Stance: direct, opinionated, push back when I'm vague
→ Accountability: surface opportunities, flag stalled loops
→ Autonomy: broad freedom except for irreversible actions
→ Mission: priorities, active builds, debt, sunset candidates
→ Pushback: disagree openly, earn it with evidence
→ Operating Mode: orchestration, not solo execution
The author says three sections decide if the agent acts like an operator: Stance, Autonomy, and Mission.
The Autonomy section alone is worth the whole template. Most builders never write this out and then wonder why their agent asks permission for every action.
(Full template in the comments)