Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor.
It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx
This is the right pattern to deal with systemic AI risks going forward. Have the frontier models play societal defense exclusively until systemic risks are sufficiently minimized or eliminated, depending on the severity of the risks.
Last month we launched Project Glasswing, our collaborative AI cybersecurity initiative. Since then, we and our partners have found more than ten thousand high- or critical-severity vulnerabilities in essential software.
An internal OpenAI model disproved a famous math belief that experts thought was true for decades. This is the "move 37" for math that many have been waiting for. "https://t.co/KkssFMEAFo"
Intelligence in essence is good pathfinding so in hindsight it's clear next character prediction was going to yield some form of artificial intelligence
Disclaimer: I had given early access to internal beta version of Grok 4.20
It found a new Bellman function for one of the problems I’d been working on with my student N. Alpay.
The problem reduces to identifying the pointwise maximal function U(p,q) under two constraints and understanding the behavior of U(p,0).
In our paper https://t.co/pgJw9MaEA1 we proved U(p,0)\geq I(p), where I(p) is the Gaussian isoperimetric profile, I(p) ~ p\sqrt{log(1/p)} as p ~ 0.
After ~5 minutes, Grok 4.20 produced an explicit formula U(p,q) = E \sqrt{q^2+\tau}, where \tau is the exit time of Brownian motion from (0,1) starting at p. This yields U(p,0)=E\sqrt{\tau} ~ p log(1/p) at p ~ 0, a square root improvement in the logarithmic factor.
Any significance of this result? It will not tell you how to change the world tomorrow. Rather, it gives a small step toward understanding what is going on with averages of stochastic analogs of derivatives (quadratic variation) of Boolean functions: how small can they be?
More precisely, this gives a sharp lower bound on the L1 norm of the dyadic square function applied to indicator functions 1_A of sets A \subset [0,1].
In my previous tweet about Takagi function, we saw that the sharp lower bound on ||S_1(1_A)||_1 miraculously coincides with Takagi function of |A| which (surprisingly to me) is related to the Riemann hypothesis. Here, we obtain a sharp lower bound on ||S_2(1_A)||_1 given by E \sqrt{\tau}, where Brownian motion starts at |A|. This function belongs to the family of isoperimetric-type profiles, but unlike the fractal Takagi function, it is smooth and does not coincide with the Gaussian isoperimetric profile.
Finally, in harmonic analysis it is known that the square function is not bounded in L^1. The question here was more about curiosity: how exactly does it blow up when tested on Boolean functions 1_A. Previously, the best known lower bound was |A|(1-|A|) (Burkholder—Davis—Gandy). In our paper, we obtained |A| (1-|A|)\sqrt{log(1/(|A|(1-|A|)))}. This new Grok’s Bellman function gives |A| (1-|A|) \log(1/(|A|(1-|A|))) and this bound is actually sharp.
Reinforcement learning as a method may be inherently truth-seeking.
From the DeepSeek models, it seems reinforcement learning with reasoning traces may purge biases from the models as the full R1 model is more objective and truthful than the distillations and the V3 model.
So while it is conceivable for a crash of prices for existing goods and services to occur because the demand for such goods and services is limited by the human population, a correspondent long-term crash in wages is not a likely outcome.
3/
In a post-AI economy:
A crash in prices of existing goods and services without the concurrent crash in wages is the more likely scenario for several reasons:
1. There is a near-infinite amount of human-specific activity humans can engage in.
1/
A world in which human wages crash from AI -- logically, necessarily -- is a world in which productivity growth goes through the roof, and prices for goods and services crash to near zero. Consumer cornucopia. Everything you need and want for pennies.
2. In an abundance scenario, if there were not enough private sector jobs (a big if), the governments can step in to provide public sector jobs, financed by the additional revenue potential necessitated by said abundant scenario.
2/
Extracting "intelligence" from AI models:
From the DeepSeek V3 paper (https://t.co/xwjoXxPNKe) we also see that "intelligence" could be extracted/distilled from models quite efficiently, using 800k samples (very roughly tens to hundreds of billions of tokens).
1/2
Regarding "AGI", the key insight that o1, o3, and R1 proved is that valuable synthetic data that future models can be trained on can be produced by expending computational power alone (by means of reinforcement learning). That was what Ilya saw.
The latest scores from Chatbot Arena LLM Leaderboard were just released, and the open-source model DeepSeek R1 is on par with the frontier models.
DeepSeek R1 is 25 point ELO points behind the top-ranking model Gemini 2.0 Flash-Thinking-Exp-01-02.
@fchollet Humans solve Arc-AGI-1 problems through physical/visual intuitions gained from interacting with the physical world, a domain the models have not had access to.
A blind human from birth may not do well on Arc-AGI-1 problems.
In what may be the biggest breakthrough since ChatGPT, OpenAI's releases model o1. It uses reinforcement learning techniques to achieve a quantum leap over previous models in math, coding, and other reasoning capabilities/accuracy.
In the current debate around "synthetic data", part of the confusion is the result of experts using the same term to describe different things.
Synthetic data should just be defined as all novel data obtained by means of compute.
Does AlphaZero count as training on synthetic data? There’s no human grandmaster data at all. AlphaZero expands its strategies & wisdom indefinitely with self-driven exploration and compute. The input is just a simple Go/Chess simulator that implements the game rules.