Announcing our fully open source code agent to support development in @leanprover. This has been a labor of love by our team at @MistralAI and we look forward to seeing what the #LeanProver community does with it!
@prz_chojecki They said they won’t announce it until July. You could reach out to them with your findings (there is a thread on the Lean Zulip). Unfortunately, it will be an uphill battle to convince them to read the model output (especially if you don’t understand it) for obvious reasons.
@gro_tsen I think you also need a big-O or little-O fudge term. For example Sawin’s result is n^1.014114/C for some constant C and Erdos’s original was n^{1 + o(1)}. (Or maybe this is implied without saying in your post.)
@aaswaminathan01@mathandcobb While I think your take has some truth (we will soon be able to autoformalize a nontrivial amount of math papers into say Lean), I think it is missing a large degree of technical, practical, and sociological nuance.
@danrobinson I think it might be unethical to raise Erdős from the dead, although the idea has been considered for other purposes: https://t.co/5lNH6bQBOa
@LatinumAI Did you rewrite the interpreter too or just the compiler? I guess now @leanprover needs an external compiler bench (alongside their external kernel bench).
@giffmana Math is usually fairly robust to errors, much more than code. There are lots of articles about why. This particular benchmark however is designed adversarially to be very fiddly, calculation based, and non-intuitive (else the model will guess the solution).
@lacker@julianboolean_ I think we have plenty of more difficult problems already? But if it is a new conjecture, it would be interesting (at least the first time) exactly because the AIs are trying to convince us it is important.
@lacker@julianboolean_ There are a number of good videos aimed at more general audiences explaining advanced math concepts including fields medal winning papers. In this hypothetical scenario where AI is good at everything, they would also be good at making this kind of content.