On-Policy Distillation is the most active new research direction being explored in RL for LLMs. Had the chance to discuss how it works with Dwarkesh and why it fits so nicely into large-scale pipelines.
This Chinese mathematician earned $10,000 a month inventing the hardest problems to train Neural Networks through Scale AI. Today his income dropped to zero.
All the solutions are now generated by the model itself. He used to just hold the problem in his head and spell it out in plain text.
His work is pure intellect. An expert in higher mathematics, he made his money hand-crafting the trickiest puzzles to test and train neural networks via RLHF. The bastion of "human" logic rested entirely on him, on people with PhDs who knew how to invent the problem.
The collapse is simple.
The shift to RLAIF and synthetic data. The model plays against itself, builds trees of logical inference, and solves deeper than a human can even invent the problem.
No PhD data engineers, no hand-written prompt-completion examples, no manual grading. Just the model, search algorithms, and Chain of Thought.
Ready-made "smart human-time" still sells on the market for many times more. His old rate was $50–100 per problem.
The internal "mini-app" was written by the model too. Inside there's no pretty shell, just bare logic with exact steps:
input: the problem statement
inference tree: thousands of branches per second
check: every step verifies itself
output: a proof a human never had time to invent
And here is what the whole setup looked like. He no longer needs to write an example by hand. He gave the model a direct instruction in human words, without a single formal term:
"solve the problem yourself and grade yourself yourself"
That's it. After that the algorithm found the solution, checked it, and trained on its own result, with no human.
→ the contractor got $50–100 per problem written
→ from 5,000 to 10,000 a month
→ now that income is annulled
→ a query to a math LLM costs 1–5 cents
→ a quant or an actuary runs 150,000–250,000 a year
→ the margin for whoever packages this into an agent is nearly 100%
In the author's own words: "I'm no longer able to invent a problem the machine can't solve. The examiner became dumber than the one he's examining."
But honestly, he admits the crude mistake himself, and it's not in the math, it's in the positioning.
He tied his income to selling "smart human-time", to crafting formulas by hand. As long as he sells formulas, he's left behind. The machine computes faster than he can invent the problem.
He names the right move himself: the role shifts from "intellectual craftsman" to "systems architect." Then he doesn't sell his time, he manages compute, packaging that same LLM into an autonomous agent that runs 24/7.
Out of everything I've seen this year about the disappearance of intellectual professions, this is the most honest example: $50 per problem zeroed out to 1 cent per query, a doctor of science losing to a search algorithm, one problem stated in human words instead of a hand-written dataset, and right away an out-loud admission of the wrong business model.
The barrier to entry in higher mathematics just dropped to the level of "describe the task in words." The only question is who'll be the first to stop selling their time and start managing the machine's compute.
有了 Claude Code 和 Cursor 这种软件以后,真的不只是写代码厉害。
我之前拿到豆包手机以后,想给它装个谷歌框架,但一直在 Google Play 那有点问题,死活装不上。
今天突然想起来,打开让 Claude Code 帮我装。
打开 USB 调试模式后,它直接就帮我搞定了:自动下载安装包、自动安装、自动调试好
这个未来感觉很有用。
How the @nytimes looks at Asia & Oceania. 👀 My analysis of every international article tagged to the region since 2000 (n≈63,000), displaying the topic with most outsize coverage. By country, alphabetical 👇
Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946.
For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids.
An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better.
This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.