@johnennis Hello, I was interested in this problem and did some progress, overall, seems like geodesic ball fails - instead, the lens likely correct approach. Do you have academic mail? I have an paper to show you.
@Ognifedefingo We know, understand, and realize we're not genius, but still we can write and think about normal science. Our effort, will do something for future researchers.
For anyone who are using AI for writing math paper, please do review with other AI at least once.
Assess whether the paper will still be readable 30 years from now. Is the reader treated as a peer, or as a verifier ? Do negative-form sentences and preemptive defenses stand?
@davidbessis I think this is claude's cycle version 2
Its very small case proof, note also admits is far from somewhere we can say 'strong', but good enough to show that models can now test ideas quickly
@__alpoge__ It is indeed impressive to see solid mathematical performance, but for mathematicians, let it think in integer set. This is genuinely hard and where we can see how models/people works with and closer to what we expect in breakthrough models, really want to see
@vasuman Codex/Claude code's goal 'sometimes not work', and simply, this is due to how model got RL. Long term goal and direction setting has to be settled by "human" or "know-hows / papers".
@1584414305_fact@prz_chojecki Yes, mostly "Continue" / "Let's think with induction" "let's make beautiful" "Think intuitively" is everything, however one thing - when it was stuck in "infinite modulo breaking" pattern, I explicitly told to step back and try other approach, and better to make it write md note
@1584414305_fact@prz_chojecki Not all 400 turn was one session - Instead, I had to make it write "all as tar bundle" and continue in new chat every 7-80 turns, even some GPT Boost chrome extension wasn't able to handle fully after eating 3GB RAM per tab (OpenAI, pls fix this)
@prz_chojecki I think there are some problems which are not the case, It solved some after 3-400 turn, https://t.co/Pe1UNHLc9G https://t.co/hQcu7vd6au
there will be "full solution" very soon. (This is generalized Knuth's claude's cycle problem")
@NoahChrein No, I can say this formalization should have been done in right way and human reviewed or at least, ask the ai agent to find sorry, admit, axiom and understand what is difference from native decide.
I respect the effort, but not the laziness.
GPT 5.5 Pro proved some non-trivial very difficult math problem 'without search', while when its allowed to search, it says "this is open problem"
So this is what they need to solve within 4 months
https://t.co/hQcu7vd6au (lean 4 formalized)
They took new upgraded 5.4 pro from me and now its slower and dull, (and more half-about-right, if you want) so it wasn't 5.4 pro upgrade, real GPT 5.5 something test
@ericmitchellai I actually got far-better level of math responses, its base knowledge is few stages higher and faster. However, it does not feel like 'pro', despite its overwhelming. More like "5.5 thinking heavy" or something.