A year ago, I asked all LLMs back then (claude 3.7 + grok 3 + deepseek v3 + gemini 2.5 + deep research) to predict the next 5 years of progress and partially to assess the plausability of the AI 2027 report.
One thing the report got is that all the labs are focusing on recursive improvement e.g. with Codex 5.3 helping created Chat Gpt 5.4 and so on i.e. "closing the loop".
Anyway, this year, new prompt and new models. Getting more quantitative and then will ask chatgpt 5.4 to summarise all answers and compare. Getting a bit meta to ask multiple AI agents to predict future progress of AI and compare previous forecasts. Grok is supposed to be optimised on forecasting accuracy!
Summary from ChatGPT of below answers from 5.4, gemini, grok and claude 4.6 sonnet:
Right, starting now, for shits and giggles, I will ask the top ~5 models every year on April 6th to:
"predict the next 5 years of AI and AGI progress"
Then we can compare over the years:
1. How right/wrong this forecast report got it
/th
@karpathy Why not just make all the agents click through websites? I thought computer use is "almost there" shown by claude chrome extension and ChatGPT agent? Of course first step is to give them permissions. Redesigning everything for text in and text out isn't the dream of digital AGI.
omg, we live in the future, claude is taking control of my browser to add my dishwasher and 15 other items as ads in ebay (kleinanzeigen) and facebook marketplace.
The first 100% autonomous coast-to-coast drive on Tesla FSD V14.2! 2 days 20 hours, 2732 miles, zero interventions.
This one is special because the coast-to-coast drive was a major goal for the autopilot team from the start. A lot of hours were spent in marathon clip review sessions late into the night looking over interventions as we attempted legs of the drive over time - triaging, categorizing, planning out all the projects to close the gap and bring the number of interventions to zero.
Amazing to see the system actually get there and huge congrats to the team!
I'd like to look back at the two mega-papers on Minecraft RL that just came out, from @OpenAI and @nvidia.
They both rely on diabolically clever ideas... but in completely different directions.
Somehow missed this. Always love Minecraft/open-ended papers!
Voyager paper blew my mind, but used code-gen on the Mineflayer API! 2022 pixel-to-action papers below are similar but used fine tuning. But this is with only offline data! I think very relevant for robotics.