jupiter @jupiter186 - Twitter Profile

3 days ago

On-Policy Distillation is the most active new research direction being explored in RL for LLMs. Had the chance to discuss how it works with Dwarkesh and why it fits so nicely into large-scale pipelines.

21

1K

127

1K

132K

jupiter186 retweeted

Blaze

@browomo

about 19 hours ago

This Chinese mathematician earned $10,000 a month inventing the hardest problems to train Neural Networks through Scale AI. Today his income dropped to zero. All the solutions are now generated by the model itself. He used to just hold the problem in his head and spell it out in plain text. His work is pure intellect. An expert in higher mathematics, he made his money hand-crafting the trickiest puzzles to test and train neural networks via RLHF. The bastion of "human" logic rested entirely on him, on people with PhDs who knew how to invent the problem. The collapse is simple. The shift to RLAIF and synthetic data. The model plays against itself, builds trees of logical inference, and solves deeper than a human can even invent the problem. No PhD data engineers, no hand-written prompt-completion examples, no manual grading. Just the model, search algorithms, and Chain of Thought. Ready-made "smart human-time" still sells on the market for many times more. His old rate was $50–100 per problem. The internal "mini-app" was written by the model too. Inside there's no pretty shell, just bare logic with exact steps: input: the problem statement inference tree: thousands of branches per second check: every step verifies itself output: a proof a human never had time to invent And here is what the whole setup looked like. He no longer needs to write an example by hand. He gave the model a direct instruction in human words, without a single formal term: "solve the problem yourself and grade yourself yourself" That's it. After that the algorithm found the solution, checked it, and trained on its own result, with no human. → the contractor got $50–100 per problem written → from 5,000 to 10,000 a month → now that income is annulled → a query to a math LLM costs 1–5 cents → a quant or an actuary runs 150,000–250,000 a year → the margin for whoever packages this into an agent is nearly 100% In the author's own words: "I'm no longer able to invent a problem the machine can't solve. The examiner became dumber than the one he's examining." But honestly, he admits the crude mistake himself, and it's not in the math, it's in the positioning. He tied his income to selling "smart human-time", to crafting formulas by hand. As long as he sells formulas, he's left behind. The machine computes faster than he can invent the problem. He names the right move himself: the role shifts from "intellectual craftsman" to "systems architect." Then he doesn't sell his time, he manages compute, packaging that same LLM into an autonomous agent that runs 24/7. Out of everything I've seen this year about the disappearance of intellectual professions, this is the most honest example: $50 per problem zeroed out to 1 cent per query, a doctor of science losing to a search algorithm, one problem stated in human words instead of a hand-written dataset, and right away an out-loud admission of the wrong business model. The barrier to entry in higher mathematics just dropped to the level of "describe the task in words." The only question is who'll be the first to stop selling their time and start managing the machine's compute.

5

185

27

230

46K

jupiter @jupiter186

1 day ago

科学网—吐血推荐《概率论沉思录》 - 姬扬的博文 https://t.co/bdRy90nS9F

0

3

jupiter186 retweeted

Jianshuo Wang

@jianshuo

2 days ago

终于从老旧的VPS+PHP+WordPress的架构，完全的转移到 Cloud Flare + Github Pages + Hugo的架构，简直一下进入现代化。十年前就该这么做了

31

44

2

42

10K

Who to follow

RaulGolfs

@rjgomz

Documenting my journey towards becoming a scratch golfer.

in math we trust. Hour. 往者不可谏，来者犹可追.

jupiter186 retweeted

歸藏(guizang.ai)

@op7418

8 days ago

有了 Claude Code 和 Cursor 这种软件以后，真的不只是写代码厉害。我之前拿到豆包手机以后，想给它装个谷歌框架，但一直在 Google Play 那有点问题，死活装不上。今天突然想起来，打开让 Claude Code 帮我装。打开 USB 调试模式后，它直接就帮我搞定了：自动下载安装包、自动安装、自动调试好这个未来感觉很有用。

154

192

9

93

52K

jupiter @jupiter186

9 days ago

有点伤自尊，但太值了！Karpathy：当AI接管80%代码，我看清了AGI魔法 https://t.co/tf47tBxWFw

0

4

jupiter @jupiter186

9 days ago

让 Claude Code 在你睡觉时持续运行：完整实战指南 https://t.co/lraFQYPj0G

0

16

jupiter @jupiter186

9 days ago

物理学家E.T. 杰恩斯及其《概率论沉思录》简介 https://t.co/vgPym3bcwp

0

16

jupiter @jupiter186

9 days ago

李建忠对话菲尔兹奖得主Timothy Gowers：整个数学研究的范式将被AI改变 https://t.co/w9QVz9U4JQ

0

45

jupiter @jupiter186

9 days ago

Tutorials on Tinygrad | tinygrad-notes https://t.co/2dvfoJ3DP4

0

11

jupiter @jupiter186

9 days ago

https://t.co/MiLl9LHs72 - Next-generation LLM Inference Network: How ZCube Alleviates Network Bottlenecks?https://t.co/8d1hkzpWr5

0

19

jupiter @jupiter186

9 days ago

Speed as the Next Scaling Law — TileRT https://t.co/wGcqyIPUG4

0

4

jupiter186 retweeted

huangserva

@servasyy_ai

16 days ago

Karpathy 的 CLAUDE.md 登上了 GitHub Trending 第一。 22 万星标，但大多数开发者还没读过。它只有 65 行。却把 AI 编码准确率从 65% 提升到了 94%。里面的 4 条规则： → 先思考再编码明确你的假设。不确定就提问。绝不猜测。 → 简单优先只写解决问题的最少代码。不要做没人要求的抽象。 → 外科手术式修改不要动与需求无关的代码。每一行修改都必须能追溯到需求。 → 目标驱动执行在写任何代码之前，把模糊的指令转化为可验证的成功标准。就这些。 65 行。4 条规则。94% 准确率。趁别人还没看到，先收藏。

62

1K

306

2K

243K

jupiter186 retweeted

karminski-牙医

@karminski3

16 days ago

400 TPS！实测智谱 GLM-5.1 以10倍速狂飙智谱刚刚发布了 glm-5.1-highspeed! 赶紧拿脚本测了一下, 输出速度能干到 300 tps+, 首 token 延迟稳定在1s. 这个数据猛到什么程度... 同样的脚本我测了下 glm-5.1 的接口, 输出速度只有 35 tps, 首 token 延迟干到了 9s. 基本是10倍速提升. 使用 glm-5.1 编程或者养龙虾/爱马仕的同学可以直接搞套餐开这个新模型了. 能做到直接吐字不用等. GLM-5.1 单次激活40B, 按照bf16精度计算, 即使不考虑 kvcache 也要80GB的显存, 那么达到 35 tps, 这就是 80x35= 2.8TB/s 的显存带宽. 而如果拉升到 300 tps, 那就是 80x300=24TB/s 的显存带宽. 如果按照 H100 SXM: 3.35 TB/s 计算, 之前单卡的带宽就能达到了, 现在需要8卡的张量并行才可以(当然张量并行也能提升请求并行度). 结果官方发布的技术文档更炸裂, 他们跟 TileRT 团队合作, 从底层把推理链路重做, 直接把显卡性能榨干了！简单说, 传统推理像流水线工厂: CPU 当调度器, 一层层发指令给 GPU, 算完一层把结果写回显存, 再读出来算下一层, 中间还要不停同步. 大量时间其实耗在这些"调度 + 搬运"上, 而不是纯计算. TileRT 的思路是反着来的: 编译阶段就把整个推理流程编排好, 变成一个常驻 GPU 的大 kernel, 推理启动后基本只 launch 一次, 后面 GPU 自己跑. 单卡里面像计算、IO、通信都拆成更小的 tile 级任务; 中间结果尽量不走大显存, 能在寄存器、共享内存、L2 cache 里直传就直传. 多卡则进行分工, 比如 GPU 0 专门干 Sparse Indexer, GPU 1–7 跑 MLA 注意力主干. (另外还有很多优化细节, 大家可以看官方发布的技术文档) 上面这些全都不用 CPU 再深度参与了, 所以提升了大量的性能. so, 正在使用 GLM-5.1 的同学抓紧切模型! #glm51 #glm51highspeed #智谱 #GLM

karminski3's tweet photo. 400 TPS！实测智谱 GLM-5.1 以10倍速狂飙

智谱刚刚发布了 glm-5.1-highspeed! 赶紧拿脚本测了一下, 输出速度能干到 300 tps+, 首 token 延迟稳定在1s.

这个数据猛到什么程度... 同样的脚本我测了下 glm-5.1 的接口, 输出速度只有 35 tps, 首 token 延迟干到了 9s. 基本是10倍速提升.

使用 glm-5.1 编程或者养龙虾/爱马仕的同学可以直接搞套餐开这个新模型了. 能做到直接吐字不用等.

GLM-5.1 单次激活40B, 按照bf16精度计算, 即使不考虑 kvcache 也要80GB的显存, 那么达到 35 tps, 这就是 80x35= 2.8TB/s 的显存带宽. 而如果拉升到 300 tps, 那就是 80x300=24TB/s 的显存带宽.

如果按照 H100 SXM: 3.35 TB/s 计算, 之前单卡的带宽就能达到了, 现在需要8卡的张量并行才可以(当然张量并行也能提升请求并行度).

结果官方发布的技术文档更炸裂, 他们跟 TileRT 团队合作, 从底层把推理链路重做, 直接把显卡性能榨干了！

简单说, 传统推理像流水线工厂: CPU 当调度器, 一层层发指令给 GPU, 算完一层把结果写回显存, 再读出来算下一层, 中间还要不停同步. 大量时间其实耗在这些"调度 + 搬运"上, 而不是纯计算.

TileRT 的思路是反着来的: 编译阶段就把整个推理流程编排好, 变成一个常驻 GPU 的大 kernel, 推理启动后基本只 launch 一次, 后面 GPU 自己跑.

单卡里面像计算、IO、通信都拆成更小的 tile 级任务; 中间结果尽量不走大显存, 能在寄存器、共享内存、L2 cache 里直传就直传.

多卡则进行分工, 比如 GPU 0 专门干 Sparse Indexer, GPU 1–7 跑 MLA 注意力主干. (另外还有很多优化细节, 大家可以看官方发布的技术文档)

上面这些全都不用 CPU 再深度参与了, 所以提升了大量的性能.

so, 正在使用 GLM-5.1 的同学抓紧切模型!

#glm51 #glm51highspeed #智谱 #GLM

40

224

14

106

43K

jupiter186 retweeted

Ted Alcorn @TedAlcorn

17 days ago

How the @nytimes looks at Asia & Oceania. 👀 My analysis of every international article tagged to the region since 2000 (n≈63,000), displaying the topic with most outsize coverage. By country, alphabetical 👇

TedAlcorn's tweet photo. How the @nytimes looks at Asia & Oceania. 👀 My analysis of every international article tagged to the region since 2000 (n≈63,000), displaying the topic with most outsize coverage. By country, alphabetical 👇 https://t.co/SaIHBq8GYg

9

181

69

90

61K

jupiter186 retweeted

OpenAI

@OpenAI

17 days ago

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

1K

27K

4K

9K

14M

jupiter @jupiter186

18 days ago

让 Claude Code 在你睡觉时持续运行：完整实战指南 https://t.co/lraFQYPj0G

0

20

jupiter @jupiter186

18 days ago

教了三十年语文，我对语文的粗浅理解都在这张PPT上了| 郭初阳一席少年第41位讲者 https://t.co/f2oBW2umVQ

0

26

jupiter @jupiter186

18 days ago

Agent Loop 简介 https://t.co/EcfZ6JLh48

0

5

jupiter @jupiter186

18 days ago

昇腾 AscendC / TileLang / Triton 同算子三版对照示例 - 知乎 https://t.co/pqWcLR6h5D

0

1

35

jupiter

@jupiter186

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users