@__alpoge__ It is indeed impressive to see solid mathematical performance, but for mathematicians, let it think in integer set. This is genuinely hard and where we can see how models/people works with and closer to what we expect in breakthrough models, really want to see
@vasuman Codex/Claude code's goal 'sometimes not work', and simply, this is due to how model got RL. Long term goal and direction setting has to be settled by "human" or "know-hows / papers".
@1584414305_fact@prz_chojecki Yes, mostly "Continue" / "Let's think with induction" "let's make beautiful" "Think intuitively" is everything, however one thing - when it was stuck in "infinite modulo breaking" pattern, I explicitly told to step back and try other approach, and better to make it write md note
@1584414305_fact@prz_chojecki Not all 400 turn was one session - Instead, I had to make it write "all as tar bundle" and continue in new chat every 7-80 turns, even some GPT Boost chrome extension wasn't able to handle fully after eating 3GB RAM per tab (OpenAI, pls fix this)
@prz_chojecki I think there are some problems which are not the case, It solved some after 3-400 turn, https://t.co/Pe1UNHLc9G https://t.co/hQcu7vd6au
there will be "full solution" very soon. (This is generalized Knuth's claude's cycle problem")
@NoahChrein No, I can say this formalization should have been done in right way and human reviewed or at least, ask the ai agent to find sorry, admit, axiom and understand what is difference from native decide.
I respect the effort, but not the laziness.
GPT 5.5 Pro proved some non-trivial very difficult math problem 'without search', while when its allowed to search, it says "this is open problem"
So this is what they need to solve within 4 months
https://t.co/hQcu7vd6au (lean 4 formalized)
They took new upgraded 5.4 pro from me and now its slower and dull, (and more half-about-right, if you want) so it wasn't 5.4 pro upgrade, real GPT 5.5 something test
@ericmitchellai I actually got far-better level of math responses, its base knowledge is few stages higher and faster. However, it does not feel like 'pro', despite its overwhelming. More like "5.5 thinking heavy" or something.
Everyone is misreading the point - the whole point was "now cybersecurity warfare has begun", not a single model, all models can and have been exploiting vulnerabilities, "don't feel relieved because some model has restricted access, get prepared, now".
Pandora's box is opened
@thomasfbloom To be honest - I still believe they will continue to release newer Pro models, "just not fully prepared yet". 5.4 Pro is also something that we wouldn't see if it was true. Howevever, yes, 5.4 Pro was "inside knowledge boundary" - this internal model feels outside
Lean, or verifiable proofs are critical. This is not only for ai models. Humans also make same mistake, and fundamentally we distrust ourselves. Proof itself desires to be beautiful, validation does not necessarily.
Some problems are solvable with just 5 turns of GPT 5.4 Pro, while some problems needs 350 turns of GPT 5.4 Pro with formal research note handling. Humans are the bottleneck for this model.
I hope ChatGPT sessions can 'talk' to each other, to retrieve the information or context, useful memories. Also if its applied to shared chat by link, or "set of links" - this will be naturally knowledge DB.
@BoWang87 GPT 5.4 Pro - is astonishingly good, with some proper workflow, it may solve all the math - however yet not close to understand "what is beautiful?", which will be the next objective. https://t.co/sguqoFQuit