ChatGPT found a clever construction improving the best known exponent for average vs. max sensitivity in monotone Boolean functions, a bound that has stood since O'Donnell-Servedio (2007). I formalized the full proof in Lean 4 (~3.7k lines, zero sorry's).
There are so many levels.
First, doing mathematics is different from solving hard sudoku puzzles. For example, an important part of mathematics is choosing definitions and building beautiful theories which is arguably a subjective activity and it’s not clear what you could mean by solving this in finite time.
Second, outside of technicalities, all reasonable math problems could be solved in “finite time”. Charitable interpretation would be “more efficient than humans”, but that’s true for a calculator multiplying large numbers as well, and yet it’s we, people, who choose which numbers to multiply.
@spicey_lemonade Replace higher category theory with Collatz conjecture if you like. Most research mathematicians are not paid for their economical value. I doubt anyone would ever find any economical value in my phd thesis.
https://t.co/7I7n5R7t4c
Most mathematicians aren’t paid to prove theorems; they’re paid to teach calculus. If there is anything to worry about, it is that AI disrupts college education and no one takes freshman classes anymore.
@spicey_lemonade For me, mathematics is aesthetics. Humans find beautiful math gems and share them with each other. You make it sound like you’d stop reading poetry just because AI can write poetry “better” than any human
Richard Feynman famously advised to keep a dozen of your favorite problems constantly present in your mind, and every time a new AI model drops, test it against each of your twelve problems to see whether it helps. Every once in a while there will be a hit, and people will say: 'How did he do it? He must be a genius!'
@burkov It’s natural to acknowledge the usage of novel technology before the practice of citing it standardizes and settles down. For example, this was done when computers started finding counterexamples, and many people still acknowledge the usage of supercomputers etc.
Most mathematicians aren’t paid to prove theorems; they’re paid to teach calculus. If there is anything to worry about, it is that AI disrupts college education and no one takes freshman classes anymore.
It's remarkable that "the proof came from a new general-purpose reasoning model, rather than from a system trained specifically for mathematics, scaffolded to search through proof strategies, or targeted at the unit distance problem in particular."
Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946.
For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids.
An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better.
This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.
Examples of incontrovertible evidence: hallucinated references, meta-comments from the LLM ("here is a 200 word summary; would you like me to make any changes?"; "the data in this table is illustrative, fill it in with the real numbers from your experiments") end/
@danrobinson I play around with the idea of the automated conjecture generation based on numerical experiments in analytical number theory and adjacent fields. Some interesting results emerge!
Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/
To be clear: this was a lucky find of low-hanging fruit rather than a typical occurrence. For some more context: we one-shot prompted 46 Kirby problems (out of almost 400 total problems), selected by a handful of mathematicians we work with based on their interests. Out of the 46, Aletheia returned solution candidates to 8 problems, admitting failure on the other 38. Of these, 7 responses were technically correct, but only 2 addressed the intended meaning of the problem, while 5 exploited flaws in the problem statements (accidentally omitting technical hypotheses).
Aside from 5.16, the only other meaningful solution was to 3.39b, but the result was shallow: it was deduced almost immediately from a paper of Chen--Lodha https://t.co/QqGlxU9pdf that appeared after the K3 list was assembled, but before it was published. Thus 3.39b should really be credited to them.
We have come a long way on accuracy since our study on the Erdős problems! We hope that a public version of Aletheia will be available before long, and that when it comes out it will be really trustworthy.