For the record I think that progress in AI-for-math will completely change the way math research is done over the next few years. I am not at all a skeptic. I just want a) honesty b) for people to understand what math research *really is*.
@SebastienBubeck@AlexKontorovich The important question scientifically is whether once we have this unit-distance-solving ChatGPT whether it also solves some huge fraction of all open problems mathematicians care about!
🥇 We aren't saying Alysa Liu "Stole the Look" from the White-crowned Sparrow, but we aren't not saying it either. Congrats on an epic win!
Learn how to help birds on and off the ice on our blog:
https://t.co/Mf2aH0C7jO
@littmath@jasondeanlee Then again, the unofficial (but widely distributed) pronouncements of folks at various labs were also too early. (I have the god-given right to complain about *everyone*!)
What is the status of verified secret computation? Specifically, suppose company A wants to claim that they have a fixed secret function f (some massive tool-using LLM ensemble) which at t_0 will be evaluated on value x submitted by unrelated org B. How A prove this to B? (1/2)
Given that there is no formal evaluation of the 1st proof challenge for round 1, it does not seem fair to interpret initial results in any way whatesoever. AI hype notwithstanding.
Let's do it properly for round 2.
"The verdict, it seems, is in: artificial intelligence is not about to replace mathematicians. That is the immediate takeaway from the “First Proof” challenge—perhaps the most robust test yet of the ability of LLMs to perform mathematical research." https://t.co/fiq6HXyYj1
@ben_golub As I understand it, the organizers are not treating the "first release" as a formal benchmark that they will be writing official evaluations for -- this will happen with subsequent releases. You should probably wait for the First Proof org to formally comment.
@MysteryHacker1 I'm not at all a relevant expert, and obviously I can't make any sense of your links. I'm sure that you can find relevant experts and explain your dramatically improved argument in a conventional manner and they would be appreciative!
Actually benchmark idea : is it easy to redo the computer part of the 4 color theorem argument with AI assistance now? :D and write up a nice streamlined summary of the strategy that undergrads can understand?
I’ll take some time to think through the implications of this work, but my first impression is that it’s a Quantum Field Theory equivalent of the four color theorem - computer assisted proof, rather than computer generated. And definitely not profound new physics from scratch.
@boazbaraktcs I totally agree that users will be constantly using models to prove research level lemmas by Feb 2027. (Would be idiotic if not.) Your earlier claim reads like “in 6 months math centaurs are ~ over”, very different!
@boazbaraktcs I totally agree that users will be constantly using models to prove research level lemmas by Feb 2027. (Would be idiotic if not.) Your earlier claim reads like “in 6 months math centaurs are ~ over”, very different!
@AcerFur Including the chat instances is important! I really do think with a mathy human thinking about them while talking to a model quite a lot of them are solvable straightforwardly — these are lemmas, after all.
For the record I think that progress in AI-for-math will completely change the way math research is done over the next few years. I am not at all a skeptic. I just want a) honesty b) for people to understand what math research *really is*.
Cheating here means claiming 2 when really something like 1 happened. It's not that they're not both meaningful, but they show pretty different behaviors and that's important. Oneshotting these problems totally autonomously (use many agents sure) is meaningfully different.
Re: #1stproof I just want to point out I think that 1) a math human with moderately relevant background can solve many of these problems by interactively talking to a model and then giving it hints, 2) this is extremely different from 1-shot performance, 3) it's tempting to cheat