Eric Zhao

@ericzhao28

llms and reasoning @openai previously: @googleai, @berkeley_ai, @caltech

San Francisco

Joined July 2020

241 Following

1.5K Followers

69 Posts

ericzhao28 retweeted

Noam Brown

@polynoamial

about 1 month ago

Today, we’re sharing that a general-purpose internal @openai model achieved a breakthrough on one of the best-known combinatorial geometry problems. Less than 1 year ago frontier AI models were at IMO gold-level performance. I expect this pace of progress to continue.

197

363

453K

ericzhao28 retweeted

OpenAI

@OpenAI

about 1 month ago

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

27K

14M

ericzhao28 retweeted

Agustin Lebron @AgustinLebron3

5 months ago

Turns out Paul Erdos's biggest contribution to humanity was as an eval creator.

554

50K

ericzhao28 retweeted

Kimon Fountoulakis

@kfountou

6 months ago

GPT-5.2 solves our COLT 2022 open problem: “Running Time Complexity of Accelerated L1-Regularized PageRank” using a standard accelerated gradient algorithm and a complementarity margin assumption. Link to the open problem: https://t.co/A3ZbJshudE All proofs were generated by GPT-5.2 Pro. The key bounds on the algorithm’s total work (in the COLT’22 open-problem setting) have been auto-formalized using a combination of GPT-5.2 Pro, @HarmonicMath's Aristotle, and Gemini 3 Pro (High) on Antigravity. Link to the proof: https://t.co/hgJ0iBcWJe Link to the Lean code: https://t.co/DeMFDlwSC9 Link to the informalization of the Lean code: https://t.co/V5BwYoIycN Link to my GPT-5.2 prompts: https://t.co/xwh5c6S81B In addition to the formalization of the main result, I checked the proof myself twice. I hope I didn’t miss anything, but if I did, please let me know and I will try to fix it. Story behind the paper and relevant work In 2016, I worked on the convergence rate of the Iterative Soft-Thresholding Algorithm (ISTA) for l1-regularized PageRank. Link to the corresponding paper: https://t.co/pDMN9QKkGh Surprisingly, the running time of the algorithm depends only on the number of non-zero nodes at optimality. It was only natural to ask the same question for accelerated methods, such as FISTA. However, we quickly realized that FISTA activates more nodes than the number of non-zeros at optimality, even though it eventually converges to the same active set. In practice, we would still observe that FISTA is fast. Link to empirical work: https://t.co/VQFJugQk0m I tried for about three months to bound the total work of FISTA and other accelerated algorithms, and from time to time I would come back to the problem while I was a postdoctoral fellow. Eventually, I gave up. I gave it another try around 2021, and I failed again. I asked my excellent former student, Shenghao Yang, and he also failed, unfortunately. I asked a couple of prominent researchers if they think the problem is solvable, they quickly mentioned that it seemed hard. We ended up publishing it as an open problem at COLT 2022. In 2023, David Martínez-Rubio et al. provided the first successful solution. Their solution is “orthogonal” to what was proved by GPT-5.2. Link to their paper: https://t.co/YPUrfGhG2T I loved their work btw, I also met David in person at ICML 2024, one of the few ML conferences I ever attended. Their proposed accelerated algorithm is not necessarily faster than ISTA; however, it does offer a new trade-off between the teleportation parameter of PageRank and the total work per iteration. More importantly, the proposed method isn’t necessarily practical, since it involves solving an expensive subproblem. To be fair, in the COLT 2022 problem, we didn’t impose the additional hard constraint of using standard accelerated methods. The problem was posed as a theoretical problem. The solution proved by GPT-5.2 establishes acceleration for the standard FISTA algorithm, which performs only one gradient computation per iteration. It also offers a clean parameterization of the total work with respect to a complementarity margin, which, for certain graph structures, shows a clear speed-up compared to ISTA. In 2024, Zhou et al. (https://t.co/Agq5ANfhuS) gave it another go. However, in my view, their work has important drawbacks. In particular, their guarantees for accelerated localized methods (e.g., localized Chebyshev / Heavy-Ball) assume a condition on the geometric mean of certain active-ratio factors (described as Θ(\sqrt{α})) in order to obtain an accelerated bound. Two distinctions matter for our setting: First, their accelerated runtime bounds are parameterized by evolving-set quantities and a residual-ratio assumption, which can be evaluated during a run but is not typically interpretable or verifiable a priori from graph structure alone. The solution by GPT-5.2 instead provides an explicit transient-phase bound in terms of a standard optimization-structure condition, and converts this directly into a total work bound. Second, they explicitly note that FISTA-style acceleration violates the monotonicity property needed to bound the per-iteration accessed volume, and emphasize that guaranteeing intermediate sparsity in accelerated frameworks is challenging. The margin-based analysis by GPT-5.2 directly targets this gap: even without any monotonicity of intermediate supports, GPT-5.2 bounded how much spurious activation can occur before the iterates enter a neighborhood of the unique minimizer, thereby yielding a concrete locality certificate for the accelerated proximal-gradient trajectory. Since 2024, every time OpenAI or Google released a new major model, I would give it a go. This time, with GPT-5.2, it seems to have worked.

$kfountou's tweet photo. GPT-5.2 solves our COLT 2022 open problem: “Running Time Complexity of Accelerated L1-Regularized PageRank” using a standard accelerated gradient algorithm and a complementarity margin assumption. Link to the open problem: https://t.co/A3ZbJshudE All proofs were generated by GPT-5.2 Pro. The key bounds on the algorithm’s total work (in the COLT’22 open-problem setting) have been auto-formalized using a combination of GPT-5.2 Pro, @HarmonicMath's Aristotle, and Gemini 3 Pro (High) on Antigravity. Link to the proof: https://t.co/hgJ0iBcWJe Link to the Lean code: https://t.co/DeMFDlwSC9 Link to the informalization of the Lean code: https://t.co/V5BwYoIycN Link to my GPT-5.2 prompts: https://t.co/xwh5c6S81B In addition to the formalization of the main result, I checked the proof myself twice. I hope I didn’t miss anything, but if I did, please let me know and I will try to fix it. Story behind the paper and relevant work In 2016, I worked on the convergence rate of the Iterative Soft-Thresholding Algorithm (ISTA) for l1-regularized PageRank. Link to the corresponding paper: https://t.co/pDMN9QKkGh Surprisingly, the running time of the algorithm depends only on the number of non-zero nodes at optimality. It was only natural to ask the same question for accelerated methods, such as FISTA. However, we quickly realized that FISTA activates more nodes than the number of non-zeros at optimality, even though it eventually converges to the same active set. In practice, we would still observe that FISTA is fast. Link to empirical work: https://t.co/VQFJugQk0m I tried for about three months to bound the total work of FISTA and other accelerated algorithms, and from time to time I would come back to the problem while I was a postdoctoral fellow. Eventually, I gave up. I gave it another try around 2021, and I failed again. I asked my excellent former student, Shenghao Yang, and he also failed, unfortunately. I asked a couple of prominent researchers if they think the problem is solvable, they quickly mentioned that it seemed hard. We ended up publishing it as an open problem at COLT 2022. In 2023, David Martínez-Rubio et al. provided the first successful solution. Their solution is “orthogonal” to what was proved by GPT-5.2. Link to their paper: https://t.co/YPUrfGhG2T I loved their work btw, I also met David in person at ICML 2024, one of the few ML conferences I ever attended. Their proposed accelerated algorithm is not necessarily faster than ISTA; however, it does offer a new trade-off between the teleportation parameter of PageRank and the total work per iteration. More importantly, the proposed method isn’t necessarily practical, since it involves solving an expensive subproblem. To be fair, in the COLT 2022 problem, we didn’t impose the additional hard constraint of using standard accelerated methods. The problem was posed as a theoretical problem. The solution proved by GPT-5.2 establishes acceleration for the standard FISTA algorithm, which performs only one gradient computation per iteration. It also offers a clean parameterization of the total work with respect to a complementarity margin, which, for certain graph structures, shows a clear speed-up compared to ISTA. In 2024, Zhou et al. (https://t.co/Agq5ANfhuS) gave it another go. However, in my view, their work has important drawbacks. In particular, their guarantees for accelerated localized methods (e.g., localized Chebyshev / Heavy-Ball) assume a condition on the geometric mean of certain active-ratio factors (described as Θ(\sqrt{α})) in order to obtain an accelerated bound. Two distinctions matter for our setting: First, their accelerated runtime bounds are parameterized by evolving-set quantities and a residual-ratio assumption, which can be evaluated during a run but is not typically interpretable or verifiable a priori from graph structure alone. The solution by GPT-5.2 instead provides an explicit transient-phase bound in terms of a standard optimization-structure condition, and converts this directly into a total work bound. Second, they explicitly note that FISTA-style acceleration violates the monotonicity property needed to bound the per-iteration accessed volume, and emphasize that guaranteeing intermediate sparsity in accelerated frameworks is challenging. The margin-based analysis by GPT-5.2 directly targets this gap: even without any monotonicity of intermediate supports, GPT-5.2 bounded how much spurious activation can occur before the iterates enter a neighborhood of the unique minimizer, thereby yielding a concrete locality certificate for the accelerated proximal-gradient trajectory. Since 2024, every time OpenAI or Google released a new major model, I would give it a go. This time, with GPT-5.2, it seems to have worked.$

675

119

333

172K

Who to follow

Clayton Mellina

@pumpikano

chronicling the phenomenology of information superfluidity \ cto @transcriptabio — hyperscaled transcriptomics

Nika Haghtalab

@nhaghtal

Associate Professor @Berkeley_EECS. Research on Foundations of ML and AI.

Holden Lee

@oldheneel

Researcher in math and computer science Writer of science fiction and fantasy

ericzhao28 retweeted

ChatGPT

@ChatGPTapp

7 months ago

322

273

299

ericzhao28 retweeted

San Francisco Chronicle @sfchronicle

7 months ago

JUST IN: Claude, the California Academy of Sciences’ rare albino alligator and one of San Francisco’s most recognizable residents, has died at age 30. https://t.co/Z3iD49p7vZ

289

316K

ericzhao28 retweeted

Jason Lee @jasondeanlee

7 months ago

I never got why there's a big group that seem to split on value based vs policy based. Somehow the policy based folks think they don't need to learn any formalism /math/theory and just can guess zeroth order gradient estimstor? But every policy opt that works uses some variance reduction technique that comes from thinking about mdps.

68K

ericzhao28 retweeted

Daniel Litt

@littmath

7 months ago

>read proof by X >makes no sense >I’ll figure it out myself >work hard, finally get it >write it down >it’s exactly the same as the proof by X

35K

ericzhao28 retweeted

Haozhe Jiang @erichzjiang

7 months ago

Can Transformers Do Everything, and Undo It Too? Check out my blog on whether language models are surjective, injective, or invertible! https://t.co/9v0gd2962J

692

485

90K

ericzhao28 retweeted

Nived Rajaraman @Nived_Rajaraman

about 1 year ago

Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025

Nived_Rajaraman's tweet photo. Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025!

📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models!
│
🗓️ Deadline: May 19, 2025 https://t.co/U91XDmtrQ6

34K

ericzhao28 retweeted

Swaroop Mishra

@Swarooprm7

over 1 year ago

PhD students: Remember to apply for the Google PhD fellowship. It will make your PhD super smooth. Application opens on 10th April 2025 Deadline: 15th May 2025

803

602

115K

Eric Zhao @ericzhao28

over 1 year ago

@raw_works @willccbb @Google_AI @haizelabs ya... it's annoying if you email me ill send it to you directly tho

Eric Zhao @ericzhao28

over 1 year ago

Thinking for longer (e.g. o1) is only one of many axes of test-time compute. In a new @Google_AI paper, we instead focus on scaling the search axis. By just randomly sampling 200x & self-verifying, Gemini 1.5 ➡️ o1 performance. The secret: self-verification is easier at scale!

ericzhao28's tweet photo. Thinking for longer (e.g. o1) is only one of many axes of test-time compute. In a new @Google_AI paper, we instead focus on scaling the search axis. By just randomly sampling 200x & self-verifying, Gemini 1.5 ➡️ o1 performance. The secret: self-verification is easier at scale!

257

355K

Eric Zhao @ericzhao28

over 1 year ago

Hi! Fitting all answers into a single context window doesn't seem to work great for problem solving... I usually just have models do a pass on each attempt individually to weed out the clearly dumb ones, and then run pairwise (k-wise for k>2 doesnt help much imo) comparisons to tie-break between the plausible candidates; this is what we did in the paper. A combination of that + applying search to the verification problem suffices, at least to the extent that the main bottleneck becomes generation not verification For info-retrieval problems, putting them all into same window usually works well enough. When it doesn't (like I'm trying to merge multiple arxiv papers), i either "merge" them into the aggregation one-by-one or by having a model group them semantically, merge within groups, and then merge between groups

Eric Zhao @ericzhao28

over 1 year ago

@littmath @Google_AI Of course, thanks for reading : )

132

Eric Zhao @ericzhao28

over 1 year ago

@ddkang @Google_AI Sorry it's been stuck in the google open sourcing process for a while... the new arxiv version should include all prompts + parameters necessary for duplication, but also if you reach me over email i can give you code directly so you can get setup. Thanks for the interest :)

604

Eric Zhao @ericzhao28

over 1 year ago

Hm I guess you're asking what percentage of that pass@k - pass@1 we can actually capture? I think it depends on the problem. On multiple choice exams pass@k might go to 100% but the model might not actually reach it correctly ever so pass@k far exceeds perf of search x k On problems where youre unlikely to run into the correct answer (or if you let pass@k only count correct proofs + answers), then it depends on how easy verification is. On AIME, theres basically no gap at scale; on livebench reasoning puzzles you can get like 80%; for my personal theory research usage probably close to 80% too

311

Eric Zhao @ericzhao28

over 1 year ago

@littmath @Google_AI *compared to non-reasoning models

271

Eric Zhao @ericzhao28

over 1 year ago

Thanks! I guess it's hard to draw a clean boundary, bc RL-trained models do learn to perform search serially in their thinking traces. But I'd say most of their gains are actually attributable to backtracking, going in more detail, self-prompting---which search scaling doesnt overlap with and should stack on top of. On orthogonality, even reasoning models benefit significantly from parallel search; you can get a good sense of this by just comparing their pass@1 and pass@k (can they get something right in k tries). If anything, i feel like the reasoning models ive used are *more* ergodic on hard proofs

653

Eric Zhao @ericzhao28

over 1 year ago

@ElsheikhTech That's what you should be doing; we focused on search in this paper bc we wanted to understand it better, but in our workflows we're applying these to reasoning models

Eric Zhao @ericzhao28

over 1 year ago

@Nived_Rajaraman @Google_AI I can only aspire to your greatness

225

Eric Zhao

@ericzhao28

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users