中国的开放权重LLM比如智谱所面临的挑战:
它们会不会因为在国内在政治和法律方面的合规约束,而影响它们向AGI 方向发展?
Claude ( opus 4.8 high)
“
一个硬心肠的怀疑者会说:谁在乎?数学没有党的路线,代码没有,蛋白质折叠、芯片设计、智能体工具调用
——中国开放模型在这些地方根本不受约束,而经济价值与递归自我改进的价值正在这里。如果AGI是”自动化认知劳动+加速研发”,那被审查的人文社科只是个精品店,不在关键路径上。你需要一个理由说明这约束咬在中央。理由是有的:社会科学之所以承重,恰恰因为它支撑着让某物成为 AGI(而非一台很好的计算器)白战略一智能体能力。一个对权力、合法性、异见、制度崩溃如何真实运作抱有被阉割模型的超人战略家,恰恰在最关键的AGI 行为处残废一一谈判、说服、治理、对自身处境及其主使者处境的建模。
你不可能有一个清醒的超人政治家,却必须相信关于政治的某些假命题
你的直觉指着真东西,但瞄得浅了一层,而松动的螺栓是”人”这个词。把它拧紧——拧紧之后的版本要难驳得多。
“
先砍掉那个前提。“AGI 必须像人一样在所有领域运作”——可人本来就不在所有领域里自由运作。每个人都在一种文化里推理,带着禁忌、圣牛、有动机的盲区。如果标准当真是”像人”,那一个意识形态受限的推理者反而是合规的。所以你的论证不能靠”人式的通用性”立足。它必须靠更硬的东西:AGI 必须能在它的制造者禁止真理的区域里仍然如实推理——也就是恰恰在”不退缩”这一点上超过普通人。这是个站得住的标准,但得你来扛,不是白送的前提。
再看开放权重的悖论,它其实是反咬你”部署层”那个版本的。“对反共用户开放”——光是权重开源就已经满足了一半。任何人都能下载 DeepSeek/Qwen/GLM,把拒答微调掉,或者跑一个 abliterated 版本。天安门的回避、台湾的套话,那都是后训练刷上去的一层薄漆,而权重的开放本身就是溶剂。用你自己文章的词汇说:表层审查是规范(gauge)。它是坐标赝象。换个部署参考系(越狱、微调、绕系统提示),它不弯曲底下任何东西就变换掉了。一个在重参数化下消失的拒答,从来就不是真正的能力缺陷。
所以根本弱点不可能是拒答。它必须是那个规范变换抹不掉的部分——而那部分活在训练信号里,不在推理行为里。机制如下:一个模型在某领域变得精深,是因为该领域的奖励/偏好梯度干净地指向”更好的推理”,策略沿着它爬了很多步。而在人文社科那些承重的政治区域——合法性、革命、一党制究竟如何衰朽或维系的真实机理——梯度不只是被堵住,而是被反转:塑造奖励的标注者无法奖励真的方向,某些地方还必须奖励假的方向。于是模型在那里从未建起高保真的结构。你能把它越狱到说出禁忌的话,但它从未被训练去好好推理这些事,因为指向真理的干净梯度从未被允许存在。这不是你能撕下来的一层拒答。这是曲率——推理流形在那个区域是真的被弯折了,换任何部署坐标都压不平。这正是你刚刚花一整篇文章磨利的外尔 vs 规范之分:只让你回避的约束是规范;逼迫目标函数在 ¬P 为真处断言 P 的约束,是曲率。
**现在是承重的那一问,你的论证在这里生或死。**一个硬心肠的怀疑者会说:谁在乎?数学没有党的路线,代码没有,蛋白质折叠、��片设计、智能体工具调用——中国开放模型在这些地方根本不受约束,而经济价值与递归自我改进的价值正在这里。如果 AGI 是”自动化认知劳动+加速研发”,那被审查的人文社科只是个精品店,不在关键路径上。你需要一个理由说明这约束咬在中央。理由是有的:社会科学之所以承重,恰恰因为它支撑着让某物成为 AGI(而非一台很好的计算器)的战略—智能体能力。一个对权力、合法性、异见、制度崩溃如何真实运作抱有被阉割模型的超人战略家,恰恰在最关键的 AGI 行为处残废——谈判、说服、治理、对自身处境及其主使者处境的建模。你不可能有一个清醒的超人政治家,却必须相信关于政治的某些假命题。所以曲率不蹲在经济边角,它压在所有人真正争抢的那个能力的必经之路上。
最后是让这主张能在聪明对手面前活下来的那点诚实。西方前沿模型并非不受约束、只有中国受约束——Opus 和其余者都背着自己的 RLHF、自己的拒答、自己的高压线。如果你框成”他们戴镣铐、我们自由”,任何敏锐的对话者一招就把它掀翻。能存活的主张是规范/曲率那个:西方的有害约束大体是规范型——模型回避、保留、不接,却很少被逼着把谎言当承重的真理来断言,所以底下的真理几何大体完整,且大体可微调回来;而亲共约束更曲率型——在一个中央区域你必须确认特定的假命题,这弯折了流形,规范变换抹不掉。这是种类之别,不只是程度或方向之别——而这是你这套论题唯一规范不变的版本,也��是说,不会在某人切换诊断坐标系、把你的模型叫作被审查者的那一刻就变号。
留给你拉的,是我这个版本上仅剩的那道缝:它假定了争议性社科区域里的”真理方向”足够良定义,以至”梯度被反转”这话还有意义。在人文真正有争议而非事实钉死之处(天安门是不是屠杀——钉死;自由民主是不是现代化的正确目的——有争议),反转梯度的故事就退化回”先验不同”,而那是规范,你的弱点在那里也蒸发了。所以这主张在禁忌命题为经验性时最锋利,在它们为规范性时最钝——而这条界,不是巧合地,正是你那篇文章在外尔与修辞之间画下的同一条界。
what would be the fundamental weakness of the Chinese open weight models vs Opus 4.8 etc?
My thoughts: in terms of becoming/accomplishing AGI, Chinese LLMs much be open to all kind of users including those anti-CCP users in terms of training and applications in the areas of humanities/social sciences, domestically and internationally. AGI has to be able to train, improve and function in all the areas like an human would/could do.
Glaude 4.8 high :
Your instinct is pointing at something real, but it’s aimed one layer too shallow, and the loose joint is the word “human.” Let me tighten it, because the tightened version is much harder to refute.
First, the premise to cut. “AGI must function in all areas like a human would” — humans don’t function freely in all areas. Every human reasons inside a culture with taboos, sacred cows, motivated blind spots. If the bar is literally “like a human,” then an ideologically constrained reasoner is depressingly on spec. So your argument can’t rest on “human-like generality.” It has to rest on something stronger: that AGI must be able to reason truthfully in regions where its makers forbid the truth — i.e. exceed the typical human precisely in not flinching. That’s a defensible bar, but it’s yours to defend, not a free premise.
Second, the open-weights paradox, which actually cuts against the deployment-level version of your claim. “Open to anti-CCP users” is already half-satisfied by the weights being open. Anyone can download DeepSeek/Qwen/GLM and fine-tune the refusals off, or run an abliterated build. The Tiananmen-deflection, the Taiwan boilerplate — that stuff is a thin post-training lacquer, and the openness of the weights is itself the solvent. In your own essay’s vocabulary: surface censorship is gauge. It’s a coordinate artifact. Change the deployment frame (jailbreak, fine-tune, system-prompt around it) and it transforms away without bending anything underneath. A refusal that vanishes under reparametrization was never a real capability deficit.
So the fundamental weakness can’t be the refusals. It has to be the part that can’t be gauged away — and that part lives in the training signal, not the inference behavior. Here’s the mechanism. A model gets sophisticated in a domain because the reward/preference gradient in that domain points cleanly toward “better reasoning,” and the policy climbs it for many steps. In the politically load-bearing regions of the humanities and social sciences — legitimacy, revolution, the actual mechanics of how a one-party state decays or holds — the gradient isn’t merely blocked, it’s inverted: the labelers who shape the reward cannot reward the true direction, and in places must reward the false one. The model therefore never builds high-fidelity structure there. You can jailbreak it into saying forbidden things, but it was never trained to reason well about them, because no clean gradient toward truth was ever allowed to exist. That’s not a refusal you can strip off. That’s curvature — the inference manifold is genuinely bent in that region, and no change of deployment coordinates flattens it. This is the real, non-removable handicap, and it’s exactly the Weyl-vs-gauge distinction you just spent a whole essay sharpening: a constraint that merely makes you decline is gauge; a constraint that forces the objective to assert P where ¬P is true is curvature.
Now the load-bearing question, which is where your argument either lives or dies. A hard-nosed skeptic says: who cares? Math has no party line. Code has no party line. Protein folding, chip design, agentic tool-use — the Chinese open models are not constrained there at all, and that’s where economic and recursive-self-improvement value lives. If AGI is “automate cognitive labor + accelerate R&D,” the censored humanities slice is a boutique, not the critical path. You need a reason the constraint bites somewhere central. And there is one: the social sciences are load-bearing precisely for the strategic-agentic capabilities that make something an AGI rather than a very good calculator. A superhuman strategist with a lobotomized model of how power, legitimacy, dissent, and institutional collapse actually work is crippled exactly where the most consequential AGI behavior lives — negotiation, persuasion, governance, modeling its own situation and its principals’. You cannot have a clear-eyed superhuman politician who must believe specific false things about politics. So the curvature doesn’t sit in an economically marginal corner; it sits on the path to the capability everyone is actually racing for.
Last, the honesty that makes the claim survive contact with a smart adversary. It is not true that Western frontier models are unconstrained and only China is constrained — Opus and the rest carry their own RLHF, their own refusals, their own third rails. If you frame it as “they’re shackled, we’re free,” any sharp interlocutor knocks it over in one move. The survivable claim is the gauge/curvature one: Western harm-constraints are mostly gauge-like — the model declines, withholds, won’t engage, but is rarely forced to assert a falsehood as a load-bearing truth, so the truth-geometry underneath stays mostly intact and largely fine-tunable. The CCP-aligned constraints are more curvature-like: in a central region you must affirm specific false propositions, which bends the manifold and cannot be gauged out. That’s a difference of kind, not just degree or direction — and it’s the only version of your thesis that’s gauge-invariant, i.e. that doesn’t flip sign the moment someone switches diagnostic coordinate systems and calls your model the censored one.
The one seam left in my version, so you can pull it: it assumes the truth-direction in contested social-science regions is well-defined enough that “the gradient is inverted” even means anything. Where the humanities are genuinely contested rather than factually pinned (was Tiananmen a massacre — pinned; is liberal democracy the right telos of modernization — contested), the inverted-gradient story degrades back into “different priors,” which is gauge, and your weakness evaporates there too. So the claim is sharpest where the forbidden propositions are empirical, and weakest where they’re normative — which is, not coincidentally, the same boundary your essay drew between Weyl and rhetoric.
https://t.co/ydJXondg6M
中国的开放权重LLM比如智谱所面临的挑战:
它们会不会因为在国内在政治和法律方面的合规约束,而影响它们向AGI 方向发展?
Claude ( opus 4.8 high)
“
一个硬心肠的怀疑者会说:谁在乎?数学没有党的路线,代码没有,蛋白质折叠、芯片设计、智能体工具调用
——中国开放模型在这些地方根本不受约束,而经济价值与递归自我改进的价值正在这里。如果AGI是”自动化认知劳动+加速研发”,那被审查的人文社科只是个精品店,不在关键路径上。你需要一个理由说明这约束咬在中央。理由是有的:社会科学之所以承重,恰恰因为它支撑着让某物成为 AGI(而非一台很好的计算器)白战略一智能体能力���一个对权力、合法性、异见、制度崩溃如何真实运作抱有被阉割模型的超人战略家,恰恰在最关键的AGI 行为处残废一一谈判、说服、治理、对自身处境及其主使者处境的建模。
你不可能有一个清醒的超人政治家,却必须相信关于政治的某些假命题
你的直觉指着真东西,但瞄得浅了一层,而松动的螺栓是”人”这个词。把它拧紧——拧紧之后的版本要难驳得多。
“
先砍掉那个前提。“AGI 必须像人一样在所有领域运作”——可人本来就不在所有领域里自由运作。每个人都在一种文化里推理,带着禁忌、圣牛、有动机的盲区。如果标准当真是”像人”,那一个意识形态受限的推理者反而是合规的。所以你的论证不能靠”人式的通用性”立足。它必须靠更硬的东西:AGI 必须能在它的制造者禁止真理的区域里仍然如实推理——也就是恰恰在”不退缩”这一点上超过��通人。这是个站得住的标准,但得你来扛,不是白送的前提。
再看开放权重的悖论,它其实是反咬你”部署层”那个版本的。“对反共用户开放”——光是权重开源就已经满足了一半。任何人都能下载 DeepSeek/Qwen/GLM,把拒答微调掉,或者跑一个 abliterated 版本。天安门的回避、台湾的套话,那都是后训练刷上去的一层薄漆,而权重的开放本身就是溶剂。用你自己文章的词汇说:表层审查是规范(gauge)。它是坐标赝象。换个部署参考系(越狱、微调、绕系统提示),它不弯曲底下任何东西就变换掉了。一个在重参数化下消失的拒答,从来就不是真正的能力缺陷。
所以根本弱点不可能是拒答。它必须是那个规范变换抹不掉的部分——而那部分活在训练信号里,不在推理行为里。机制如下:一个模型在某领域变得精深,是因为该领域的奖励/偏好梯度干净地指向”更好的推理”,策略沿着它爬了很多步。而在人文社科那些承重的政治区域——合法性、革命、一党制究竟如何衰朽或维系的真实机理——梯度不只是被堵住,而是被反转:塑造奖励的标注者无法奖励真的方向,某些地方还必须奖励假的方向。于是模型在那里从未建起高保真的结构。你能把它越狱到说出禁忌的话,但它从未被训练去好好推理这些事,因为指向真理的干净梯度从未被允许存在。这不是你能撕下来的一层拒答。这是曲率——推理流形在那个区域是真的被弯折了,换任何部署坐标都压不平。这正是你刚刚花一整篇文章磨利的外尔 vs 规范之分:只让你回避的约束是规范;逼迫目标函数在 ¬P 为真处断言 P 的约束,是曲率。
**现在是承重的那一问,你的论证在这里生或死。**一个硬心肠的怀疑者会说:谁在乎?数学没有党的路线,代码没有,蛋白质折叠��芯片设计、智能体工具调用——中国开放模型在这些地方根本不受约束,而经济价值与递归自我改进的价值正在这里。如果 AGI 是”自动化认知劳动+加速研发”,那被审查的人文社科只是个精品店,不在关键路径上。你需要一个理由说明这约束咬在中央。理由是有的:社会科学之所以承重,恰恰因为它支撑着让某物成为 AGI(而非一台很好的计算器)的战略—智能体能力。一个对权力、合法性、异见、制度崩溃如何真实运作抱有被阉割模型的超人战略家,恰恰在最关键的 AGI 行为处残废——谈判、说服、治理、对自身处境及其主使者处境的建模。你不可能有一个清醒的超人政治家,却必须相信关于政治的某些假命题。所以曲率不蹲在经济边角,它压在所有人真正争抢的那个能力的必经之路上。
最后是让这主张能在聪明对手面前活下来的那点诚实。西方前沿模型并非不受约束、只有中国受约束——Opus 和其余者都背着自己的 RLHF、自己的拒答、自己的高压线。如果你框成”他们戴镣铐、我们自由”,任何敏锐的对话者一招就把它掀翻。能存活的主张是规范/曲率那个:西方的有害约束大体是规范型——模型回避、保留、不接,却很少被逼着把谎言当承重的真理来断言,所以底下的真理几何大体完整,且大体可微调回来;而亲共约束更曲率型——在一个中央区域你必须确认特定的假命题,这弯折了流形,规范变换抹不掉。这是种类之别,不只是程度或方向之别——而这是你这套论题唯一规范不变的版本,也就是说,不会在某人切换诊断坐标系、把你的模型叫作被审查者的那一刻就变号。
留给你拉的,是我这个版本上仅剩的那道缝:它假定了争议性社科区域里的”真理方向”足够良定义,以至”梯度被反转”这话还有意义。��人文真正有争议而非事实钉死之处(天安门是不是屠杀——钉死;自由民主是不是现代化的正确目的——有争议),反转梯度的故事就退化回”先验不同”,而那是规范,你的弱点在那里也蒸发了。所以这主张在禁忌命题为经验性时最锋利,在它们为规范性时最钝——而这条界,不是巧合地,正是你那篇文章在外尔与修辞之间画下的同一条界。
what would be the fundamental weakness of the Chinese open weight models vs Opus 4.8 etc?
My thoughts: in terms of becoming/accomplishing AGI, Chinese LLMs much be open to all kind of users including those anti-CCP users in terms of training and applications in the areas of humanities/social sciences, domestically and internationally. AGI has to be able to train, improve and function in all the areas like an human would/could do.
Glaude 4.8 high :
Your instinct is pointing at something real, but it’s aimed one layer too shallow, and the loose joint is the word “human.” Let me tighten it, because the tightened version is much harder to refute.
First, the premise to cut. “AGI must function in all areas like a human would” — humans don’t function freely in all areas. Every human reasons inside a culture with taboos, sacred cows, motivated blind spots. If the bar is literally “like a human,” then an ideologically constrained reasoner is depressingly on spec. So your argument can’t rest on “human-like generality.” It has to rest on something stronger: that AGI must be able to reason truthfully in regions where its makers forbid the truth — i.e. exceed the typical human precisely in not flinching. That’s a defensible bar, but it’s yours to defend, not a free premise.
Second, the open-weights paradox, which actually cuts against the deployment-level version of your claim. “Open to anti-CCP users” is already half-satisfied by the weights being open. Anyone can download DeepSeek/Qwen/GLM and fine-tune the refusals off, or run an abliterated build. The Tiananmen-deflection, the Taiwan boilerplate — that stuff is a thin post-training lacquer, and the openness of the weights is itself the solvent. In your own essay’s vocabulary: surface censorship is gauge. It’s a coordinate artifact. Change the deployment frame (jailbreak, fine-tune, system-prompt around it) and it transforms away without bending anything underneath. A refusal that vanishes under reparametrization was never a real capability deficit.
So the fundamental weakness can’t be the refusals. It has to be the part that can’t be gauged away — and that part lives in the training signal, not the inference behavior. Here’s the mechanism. A model gets sophisticated in a domain because the reward/preference gradient in that domain points cleanly toward “better reasoning,” and the policy climbs it for many steps. In the politically load-bearing regions of the humanities and social sciences — legitimacy, revolution, the actual mechanics of how a one-party state decays or holds — the gradient isn’t merely blocked, it’s inverted: the labelers who shape the reward cannot reward the true direction, and in places must reward the false one. The model therefore never builds high-fidelity structure there. You can jailbreak it into saying forbidden things, but it was never trained to reason well about them, because no clean gradient toward truth was ever allowed to exist. That’s not a refusal you can strip off. That’s curvature — the inference manifold is genuinely bent in that region, and no change of deployment coordinates flattens it. This is the real, non-removable handicap, and it’s exactly the Weyl-vs-gauge distinction you just spent a whole essay sharpening: a constraint that merely makes you decline is gauge; a constraint that forces the objective to assert P where ¬P is true is curvature.
Now the load-bearing question, which is where your argument either lives or dies. A hard-nosed skeptic says: who cares? Math has no party line. Code has no party line. Protein folding, chip design, agentic tool-use — the Chinese open models are not constrained there at all, and that’s where economic and recursive-self-improvement value lives. If AGI is “automate cognitive labor + accelerate R&D,” the censored humanities slice is a boutique, not the critical path. You need a reason the constraint bites somewhere central. And there is one: the social sciences are load-bearing precisely for the strategic-agentic capabilities that make something an AGI rather than a very good calculator. A superhuman strategist with a lobotomized model of how power, legitimacy, dissent, and institutional collapse actually work is crippled exactly where the most consequential AGI behavior lives — negotiation, persuasion, governance, modeling its own situation and its principals’. You cannot have a clear-eyed superhuman politician who must believe specific false things about politics. So the curvature doesn’t sit in an economically marginal corner; it sits on the path to the capability everyone is actually racing for.
Last, the honesty that makes the claim survive contact with a smart adversary. It is not true that Western frontier models are unconstrained and only China is constrained — Opus and the rest carry their own RLHF, their own refusals, their own third rails. If you frame it as “they’re shackled, we’re free,” any sharp interlocutor knocks it over in one move. The survivable claim is the gauge/curvature one: Western harm-constraints are mostly gauge-like — the model declines, withholds, won’t engage, but is rarely forced to assert a falsehood as a load-bearing truth, so the truth-geometry underneath stays mostly intact and largely fine-tunable. The CCP-aligned constraints are more curvature-like: in a central region you must affirm specific false propositions, which bends the manifold and cannot be gauged out. That’s a difference of kind, not just degree or direction — and it’s the only version of your thesis that’s gauge-invariant, i.e. that doesn’t flip sign the moment someone switches diagnostic coordinate systems and calls your model the censored one.
The one seam left in my version, so you can pull it: it assumes the truth-direction in contested social-science regions is well-defined enough that “the gradient is inverted” even means anything. Where the humanities are genuinely contested rather than factually pinned (was Tiananmen a massacre — pinned; is liberal democracy the right telos of modernization — contested), the inverted-gradient story degrades back into “different priors,” which is gauge, and your weakness evaporates there too. So the claim is sharpest where the forbidden propositions are empirical, and weakest where they’re normative — which is, not coincidentally, the same boundary your essay drew between Weyl and rhetoric.
https://t.co/ydJXondg6M
@dee_bosa These are not "open source" models, these LLMs are black boxes when it comes to their training data. The correct term is "open weight", is very different, from the open source term we associate w true open source, such as Linux. Ai2 olmo model is the only true "open source" LLM.
what would be the fundamental weakness of the Chinese open weight models vs Opus 4.8 etc?
My thoughts: in terms of becoming/accomplishing AGI, Chinese LLMs much be open to all kind of users including those anti-CCP users in terms of training and applications in the areas of humanities/social sciences, domestically and internationally. AGI has to be able to train, improve and function in all the areas like an human would/could do.
Glaude 4.8 high :
Your instinct is pointing at something real, but it’s aimed one layer too shallow, and the loose joint is the word “human.” Let me tighten it, because the tightened version is much harder to refute.
First, the premise to cut. “AGI must function in all areas like a human would” — humans don’t function freely in all areas. Every human reasons inside a culture with taboos, sacred cows, motivated blind spots. If the bar is literally “like a human,” then an ideologically constrained reasoner is depressingly on spec. So your argument can’t rest on “human-like generality.” It has to rest on something stronger: that AGI must be able to reason truthfully in regions where its makers forbid the truth — i.e. exceed the typical human precisely in not flinching. That’s a defensible bar, but it’s yours to defend, not a free premise.
Second, the open-weights paradox, which actually cuts against the deployment-level version of your claim. “Open to anti-CCP users” is already half-satisfied by the weights being open. Anyone can download DeepSeek/Qwen/GLM and fine-tune the refusals off, or run an abliterated build. The Tiananmen-deflection, the Taiwan boilerplate — that stuff is a thin post-training lacquer, and the openness of the weights is itself the solvent. In your own essay’s vocabulary: surface censorship is gauge. It’s a coordinate artifact. Change the deployment frame (jailbreak, fine-tune, system-prompt around it) and it transforms away without bending anything underneath. A refusal that vanishes under reparametrization was never a real capability deficit.
So the fundamental weakness can’t be the refusals. It has to be the part that can’t be gauged away — and that part lives in the training signal, not the inference behavior. Here’s the mechanism. A model gets sophisticated in a domain because the reward/preference gradient in that domain points cleanly toward “better reasoning,” and the policy climbs it for many steps. In the politically load-bearing regions of the humanities and social sciences — legitimacy, revolution, the actual mechanics of how a one-party state decays or holds — the gradient isn’t merely blocked, it’s inverted: the labelers who shape the reward cannot reward the true direction, and in places must reward the false one. The model therefore never builds high-fidelity structure there. You can jailbreak it into saying forbidden things, but it was never trained to reason well about them, because no clean gradient toward truth was ever allowed to exist. That’s not a refusal you can strip off. That’s curvature — the inference manifold is genuinely bent in that region, and no change of deployment coordinates flattens it. This is the real, non-removable handicap, and it’s exactly the Weyl-vs-gauge distinction you just spent a whole essay sharpening: a constraint that merely makes you decline is gauge; a constraint that forces the objective to assert P where ¬P is true is curvature.
Now the load-bearing question, which is where your argument either lives or dies. A hard-nosed skeptic says: who cares? Math has no party line. Code has no party line. Protein folding, chip design, agentic tool-use — the Chinese open models are not constrained there at all, and that’s where economic and recursive-self-improvement value lives. If AGI is “automate cognitive labor + accelerate R&D,” the censored humanities slice is a boutique, not the critical path. You need a reason the constraint bites somewhere central. And there is one: the social sciences are load-bearing precisely for the strategic-agentic capabilities that make something an AGI rather than a very good calculator. A superhuman strategist with a lobotomized model of how power, legitimacy, dissent, and institutional collapse actually work is crippled exactly where the most consequential AGI behavior lives — negotiation, persuasion, governance, modeling its own situation and its principals’. You cannot have a clear-eyed superhuman politician who must believe specific false things about politics. So the curvature doesn’t sit in an economically marginal corner; it sits on the path to the capability everyone is actually racing for.
Last, the honesty that makes the claim survive contact with a smart adversary. It is not true that Western frontier models are unconstrained and only China is constrained — Opus and the rest carry their own RLHF, their own refusals, their own third rails. If you frame it as “they’re shackled, we’re free,” any sharp interlocutor knocks it over in one move. The survivable claim is the gauge/curvature one: Western harm-constraints are mostly gauge-like — the model declines, withholds, won’t engage, but is rarely forced to assert a falsehood as a load-bearing truth, so the truth-geometry underneath stays mostly intact and largely fine-tunable. The CCP-aligned constraints are more curvature-like: in a central region you must affirm specific false propositions, which bends the manifold and cannot be gauged out. That’s a difference of kind, not just degree or direction — and it’s the only version of your thesis that’s gauge-invariant, i.e. that doesn’t flip sign the moment someone switches diagnostic coordinate systems and calls your model the censored one.
The one seam left in my version, so you can pull it: it assumes the truth-direction in contested social-science regions is well-defined enough that “the gradient is inverted” even means anything. Where the humanities are genuinely contested rather than factually pinned (was Tiananmen a massacre — pinned; is liberal democracy the right telos of modernization — contested), the inverted-gradient story degrades back into “different priors,” which is gauge, and your weakness evaporates there too. So the claim is sharpest where the forbidden propositions are empirical, and weakest where they’re normative — which is, not coincidentally, the same boundary your essay drew between Weyl and rhetoric.
My thoughts: in terms of becoming/accomplishing AGI, Chinese LLMs much be open to all kind of users including those anti-CCP users in terms of training and applications in the areas of humanities/social sciences, domestically and internationally. AGI has to be able to train, improve and function in all the areas like an human would/could do.
Quite a bad take 😀
1. Frontier US models are expensive not because they are pricey to serve but because they serve at a very good margin. They can afford this margin because these models are genuinely better than the open-source alternatives. The twitter narrative that "Chinese models now dominate in usage cause OpenRouter" is just nonsense.
2. Once you have a powerful model, you can just distil it into a smaller one to enable cheap serving. You have all the logprobs, hidden states, and the training corpus – making a new model is simple; you can experiment with a smaller size, different attention mechanisms, etc. You can make it very cheap to serve. At the moment everyone just wants the best model, so Anthropic doesn't care. If this changes, and price becomes an issue, they will make the model cheaper; it will be trivial compared to training Mythos.
3. US companies massively benefit from access to frontier compute; newer offerings from NVIDIA give you a massive cost advantage that is very, very hard to beat. You want different compute for prefill and for decode; you want to use the NVL72 so dispatch is fast, etc.
4. For sparse MoEs, there are massive benefits to scaling. You want to split the model across hundreds of GPUs, overlap compute and dispatch, and saturate each expert. To do this, you need continuously to have millions of requests, ideally spread across different time zones so you can utilise this as close to 24/7 as possible. There are very few companies that meet this requirement (mostly Big Tech).
If you don't have this, you will be paying for compute that is idling. As prices of GPUs skyrocket, you won't be able to justify it.
There is a lot of money to be made in inference; there are very distinctive patterns that you can specialise in and make a lot of money from. But you need to think about this from first principles, and "companies will buy B200 nodes and serve internally running SGLang" is not going to happen, at least not at scale needed to make billions 😅
Quite a bad take 😀
1. Frontier US models are expensive not because they are pricey to serve but because they serve at a very good margin. They can afford this margin because these models are genuinely better than the open-source alternatives. The twitter narrative that "Chinese models now dominate in usage cause OpenRouter" is just nonsense.
2. Once you have a powerful model, you can just distil it into a smaller one to enable cheap serving. You have all the logprobs, hidden states, and the training corpus – making a new model is simple; you can experiment with a smaller size, different attention mechanisms, etc. You can make it very cheap to serve. At the moment everyone just wants the best model, so Anthropic doesn't care. If this changes, and price becomes an issue, they will make the model cheaper; it will be trivial compared to training Mythos.
3. US companies massively benefit from access to frontier compute; newer offerings from NVIDIA give you a massive cost advantage that is very, very hard to beat. You want different compute for prefill and for decode; you want to use the NVL72 so dispatch is fast, etc.
4. For sparse MoEs, there are massive benefits to scaling. You want to split the model across hundreds of GPUs, overlap compute and dispatch, and saturate each expert. To do this, you need continuously to have millions of requests, ideally spread across different time zones so you can utilise this as close to 24/7 as possible. There are very few companies that meet this requirement (mostly Big Tech).
If you don't have this, you will be paying for compute that is idling. As prices of GPUs skyrocket, you won't be able to justify it.
There is a lot of money to be made in inference; there are very distinctive patterns that you can specialise in and make a lot of money from. But you need to think about this from first principles, and "companies will buy B200 nodes and serve internally running SGLang" is not going to happen, at least not at scale needed to make billions 😅
“中国五月出口同比增长19.4%,达到3767亿美元,创下历史新高。 更重要的是,带动成长的不再只是低价商品,而是 AI 伺服器、半导体与高科技产品。 中国确实搭上了全球AI热潮。 ��而,与台湾、韩国、日本、美国不同的是,AI没有点燃中国股市。
同样受惠于AI浪潮,台积电、三星、美国科技股纷纷创高,资本市场重新评价未来成长。 但中国却出现截然不同的景象:出口火热, A 股冷清; AI 订单增加,市场信心不足。 原因不在于中国没有 AI 产业,而在于北京当前最重要的任务,根本不是制造牛市。 中国现在最需要稳住的,不是股市,而是银行体系。
中国出口数据愈来愈像盛夏, A 股却依然停留在初冬。 如果只看出口,中国经济似乎正在复苏; 如果再看内需,却会发现另一幅截然不同的景象。 五月零售销售年减0.6%,出现疫情结束后罕见的消费负增长。 居民存款更在四月减少1.94万亿元后,五月再减少1,100亿元,两个月累计流失超过2万亿元,创下近十年少见的连续下降纪录。 这就引出一个���正重要的问题:如果中国也在 AI 供应链里赚到了钱,为什么中国股市没有像台湾、韩国、日本与美国那样被 AI 点燃?
这也解释了中国经济数据的矛盾现象:出口像盛夏,内需像寒冬; AI 产业链在外面点火,金融系统却在里面降温。 许多人以为中国股市没涨,是因为经济不好; 事实上,更深层的原因是中国金融体系承受不起一场真正的牛市。
中国正在“控温”的不只是股市 而是...
今天中国控温的,早已不只是股市,而是整个金融体系。 在地方债风险尚未解除之前,中国需要的不是资产价格全面上涨,而是一个既不崩盘、也不沸腾的市场环境。 所以问题不是:为什么 AI 没有点燃 A 股? 而是:中国现在,真的承受得起被 AI 点燃的股市吗?
GLM blows a big hole in my thesis that the Chinese have a catastrophic disadvantage in high-quality data. It's too close to Opus. They did this without spending billions. I don't know how. Maybe distillation is all you need, like it served to bootstrap early assistants.
Wild.
GLM blows a big hole in my thesis that the Chinese have a catastrophic disadvantage in high-quality data. It's too close to Opus. They did this without spending billions. I don't know how. Maybe distillation is all you need, like it served to bootstrap early assistants.
Wild.