AI now can write long proofs, but autoformalization a research paper by Lean is still hard.
🚀 Check our LeanMarathon on 4 Erdős problems, fully autonomously! https://t.co/7zBO50oHnB
Led by my student @yuanhezhang6 and collaboration with Yuekai, @btreetaiji and @jasondeanlee
The future of Math is mathematicians and AI agents working together.
Very pleased to introduce @GoogleDeepMind's AI co-mathematician: a multi-agent system designed to actively collaborate with human experts on open-ended research mathematics.
Mathematicians testing the agent across areas as diverse as group theory, Hamiltonian systems, and algebraic combinatorics have reported impressive results.
In autonomous mode evaluation on the rigorous FrontierMath Tier 4 problems, AI co-mathematician scored an unprecedented 48% — a new high score among all AI systems evaluated.
Introducing Goedel-Code-Prover 🌲
LLMs write code, but can they prove it correct? Not just pass tests, but construct machine-checkable proofs that a program works for ALL possible inputs.
We built a system that does exactly this. Given aprogram and its specification in Lean 4, Goedel-Code-Prover automatically synthesizes formal proofs ofcorrectness.
Our 8B model achieves 62% overall success rate across three benchmarks (Verina, Clever &AlgoVeri), a 2.6x improvement over the strongest baseline, surpassing both frontier LLMs (GPT/Gemini/Claude)and open-source theorem provers up to 84x larger (DeepSeek-Prover/Goedel-Prover/Kimina-Prover/BFS-Prover).
We open-sourced Axplorer.
Axplorer builds on PatternBoost; it discovers outlier math constructions to attack open problems.
On Turán 4-Cycles, No 5 Points on Sphere, and Isosceles-Free Sets, Axplorer matched SOTA w/ a fraction of compute cost and time.
It's now in your hands.
I'm releasing OpenProver v1.0.0!
It's 1) an open-source automated theorem prover inspired by DeepMind's Aletheia (@tonylfeng@gjb_ai@lmthang), and 2) a "Claude Code for mathematicians", allowing interactive proof search in English and formalization in Lean.
Today, at the @DARPA expMath kickoff, we launched 𝗢𝗽𝗲𝗻𝗚𝗮𝘂𝘀𝘀, an open source and state of the art autoformalization agent harness for developers and practitioners to accelerate progress at the frontier.
It is stronger, faster, and more cost-efficient than off-the-shelf alternatives. On FormalQualBench, running with a 4-hour timeout, it beats @HarmonicMath's Aristotle agent with no time limit.
Users of OpenGauss can interact with it as much or as little as they want, can easily manage many subagents working in parallel, and can extend / modify / introspect OpenGauss because it is permissively open-source. OpenGauss was developed in close collaboration with maintainers of leading open-source AI tooling for Lean.
Read the report and try it out:
Prover correctness is becoming a central question as AI enters mathematics and software verification. New essay on why Lean's architecture is designed to survive AI pressure.
https://t.co/CXaDTSEWum
Happy to share new progress in AI for Maths @GoogleDeepMind .
In extremal combinatorics, AlphaEvolve has helped establish new lower bounds for FIVE classical Ramsey numbers - a problem so challenging that even Erdős commented on its difficulty.
Historically, computationally deriving these bounds required bespoke, human-designed search algorithms. For many of these bounds, the best previous results are at least a decade old. AlphaEvolve changes this by acting as a single meta-algorithm that automatically discovers the search procedures needed to find these new bounds. 📷
Great read by Allyn Jackson on how AI is reshaping mathematics. Also thanks for the nod Allyn. Highly recommend checking this one out: https://t.co/O5maECW3fz
1/ RELEASING AXLE: the Axiom Lean Engine ⚙️
We are serving our core Infrastructure for formal proving at scale.
These are the same Lean metaprogramming tools that are behind AxiomProver, powering it to win Putnam and crack open research conjectures.
Available to anyone today!
AI is writing a growing share of the world's software. No one is formally verifying any of it.
New essay: "When AI Writes the World's Software, Who Verifies It?"
https://t.co/8zjS9FkdA8
We’re excited to release TorchLean which is the first fully verified neural network framework in Lean. The Lean community has largely focused on pure mathematics. TorchLean expands this frontier toward verified neural network software and scientific computing. With the recent release of CSlib, we see this as another step toward a fully verified ML stack.
We support features:
1. Executable IEEE-754 floating-point semantics (and extensible alternative FP models) verified tensor abstractions with precise shape/indexing semantics
2. Formally verified autograd system for differentiation of NN programs Proof-checked certification / verification algorithms like CROWN (robustness, bounds, etc.)
3. PyTorch-inspired modeling API with eager-style development + export/lowering to a shared IR for execution and verification
Project page: https://t.co/YHpqhRbMQe
Paper: [2602.22631] TorchLean: Formalizing Neural Networks in Lean
Work done @Robertljg, Jennifer Cruden, Xiangru Zhong, @huan_zhang12 and @AnimaAnandkumar.
#MachineLearning #ScientificComputing #Lean
Thrilled to share: #Aletheia, our math research agent, just solved 6/10 notoriously hard FirstProof problems autonomously, the best result in the inaugural challenge! To me, this is even bigger than our historic IMO-gold achievement last year; these problems challenge even top mathematicians. We share our results transparently, see paper and full thoughts in the thread. 👇
“Learning Without Training”
The current problem is that most learning on manifolds pipelines still rely on a brittle two-step recipe: first estimate the manifold, then learn a predictor. So errors and hyperparameters can easily stack up.
This paper introduces a paradigm for machine learning that constructs models directly from data using mathematically derived kernels and functional analysis instead of iterative optimization.
This means you can often skip training and manifold learning entirely. You can just take your examples, and predict new ones by doing a smart weighted average of nearby points using a carefully designed kernel, kinda like local smoothing with math guarantees.
This creates a blueprint for fast and stable learning without backprop, and more like a plug-and-play geometry + linear algebra than train a huge model and pray it converges.
Introducing Aletheia, a math research agent powered by an advanced version of Gemini Deep Think that produces publishable math research (two papers, one completely automatic and another with human-AI collaboration) and solved multiple open Erdős problems. 😀🔥
Paper link below! 👇
"First Proof"
A team of researchers proposes a way to test if AI can actually do NEW math by releasing 10 freshly-solved and never public research questions, with answers temporarily encrypted.
This let's the community able to measure the genuine performance of LLMs on proof-generation, before their solutions drop.
Questions include:
- stochastic analysis
- p-adic representation theory
- algebraic combinatorics
- spectral graph theory
- equivariant algebraic topology
- lattices in Lie groups/topology
- symplectic geometry
- tensor algebraic relations
- numerical linear algebra
[1/n]
Super excited to introduce PaperBanana 🍌! (PKU x Google Cloud AI)
As AI researchers, we often spend way too much time crafting diagrams and plots instead of focusing on the ideas 🤯. To rescue us from this burden, we built an Agentic Framework to auto-generate NeurIPS-quality paper illustrations!
📄 Paper: https://t.co/2NbQeEhzMv
🌐 Page: https://t.co/05dKkjVs7f
Key Features:
🌟 Human-like Workflow: Retrieve 🔍 -> Plan 📝 -> Style 🎨 -> Render 🖼️ -> Critique 🔄. This ensures both academic fidelity and aesthetics.
🌟 Versatile: Supports both illustrative diagrams and statistical plots.
🌟 Polishing: Also effective for polishing existing human-drawn diagrams.
Here are some example diagrams and plots generated by our PaperBanana:
Excited to share our latest work: "Semi-Autonomous Mathematics Discovery with Gemini." We used Gemini to systematically evaluate 700 "open" conjectures in the Erdős Problems database.
The result? We addressed 13 problems marked as open—finding 5 novel autonomous solutions and identifying 8 existing solutions missed by previous literature.
Read the full case study here: https://t.co/y4WhkP4ETO