@petergostev But also consider that coding is basically the one field that labs have gone all in on and that outside of coding there are many verticals where progress actually is much slower
It doesn’t seem like hill climbing is as general as next token prediction
@lukaspet@dwarkesh_sp Unclear that an arbitrarily intelligent being can get arbitrarily high amounts of power. This obviously depends on the environment. Put ChatGPT-5.5 into the Neolithic see how it does.
I’m telling you right now someone could build a $1B company that’s a full video editor that connects to codex, Claude code and cursor etc. A super app native video editor. Don’t even build the standalone app with an ai agent side panel. Waste of time. Just make it a plugin that works inside codex.
@thsottiaux (1) most important add /loop like claude.
(2) second most important, codex is still much worse than claude at clear explanations which makes debugging painful with it
@thsottiaux im sad about the day that the $200 plan won't give you (effectively) all you can eat frontier model 😢
pls make that day as far in the future as you can 🙏
Frontier LLMs be like:
Yeah, we can find 5-year old zero-days in the most hardened codebases in the world.
Frontier LLMs also be like:
Doesn't add requires_auth decorator to backend endpoint that should obviously require auth... 🤦♂️
The most recent update to @METR_Evals makes it reasonable to start projecting higher reliability rates on their ai exponentials.
My big takeaway is that I think we’ll have a somewhat longer centaur period, but regardless, reliability will come.