Ben Duffy @benduffyMMM - Twitter Profile

Pinned Tweet

Ben Duffy @benduffyMMM

about 1 year ago · Schollene

Releasing: Johnny the humanoid vs. the socks https://t.co/OoYWnLSNQy

0

3

0

455

Ben Duffy @benduffyMMM

about 1 month ago

Claude Opus 4.7 likes: Phases, tasks, gates, stages, rungs

0

1

0

35

Ben Duffy @benduffyMMM

2 months ago

https://t.co/ghLgKRPZlV

0

7

Ben Duffy @benduffyMMM

2 months ago

A year ago, I asked all LLMs back then (claude 3.7 + grok 3 + deepseek v3 + gemini 2.5 + deep research) to predict the next 5 years of progress and partially to assess the plausability of the AI 2027 report. One thing the report got is that all the labs are focusing on recursive improvement e.g. with Codex 5.3 helping created Chat Gpt 5.4 and so on i.e. "closing the loop". Anyway, this year, new prompt and new models. Getting more quantitative and then will ask chatgpt 5.4 to summarise all answers and compare. Getting a bit meta to ask multiple AI agents to predict future progress of AI and compare previous forecasts. Grok is supposed to be optimised on forecasting accuracy! Summary from ChatGPT of below answers from 5.4, gemini, grok and claude 4.6 sonnet:

benduffyMMM's tweet photo. A year ago, I asked all LLMs back then (claude 3.7 + grok 3 + deepseek v3 + gemini 2.5 + deep research) to predict the next 5 years of progress and partially to assess the plausability of the AI 2027 report.

One thing the report got is that all the labs are focusing on recursive improvement e.g. with Codex 5.3 helping created Chat Gpt 5.4 and so on i.e. "closing the loop".

Anyway, this year, new prompt and new models. Getting more quantitative and then will ask chatgpt 5.4 to summarise all answers and compare. Getting a bit meta to ask multiple AI agents to predict future progress of AI and compare previous forecasts. Grok is supposed to be optimised on forecasting accuracy!

Summary from ChatGPT of below answers from 5.4, gemini, grok and claude 4.6 sonnet:

Ben Duffy @benduffyMMM

about 1 year ago · Berlin

Right, starting now, for shits and giggles, I will ask the top ~5 models every year on April 6th to: "predict the next 5 years of AI and AGI progress" Then we can compare over the years: 1. How right/wrong this forecast report got it /th

1

4

0

1

263

1

0

74

Who to follow

Jascha Sohl-Dickstein

@jaschasd

Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.

Tejas Kulkarni

@tejasdkulkarni

Scientist @GoogleDeepMind. ex CEO @CSM_ai. Interested in AGI, Brain and AI creativity. PhD @mitbrainandcog

Aleksandra Faust

@AleksandraFaust

Foundation Models & AI Agents for the Real World | AI Innovation from Inception to Societal Impact | @genesismolai • @GoogleDeepMind • @Waymo • @SandiaLabs

Ben Duffy @benduffyMMM

2 months ago

https://t.co/rmSsBTtmPk

1

0

10

Ben Duffy @benduffyMMM

2 months ago

I love talking to mini AGIs about what a true AGI will be like

0

24

Ben Duffy @benduffyMMM

2 months ago

@karpathy Why not just make all the agents click through websites? I thought computer use is "almost there" shown by claude chrome extension and ChatGPT agent? Of course first step is to give them permissions. Redesigning everything for text in and text out isn't the dream of digital AGI.

0

190

Ben Duffy @benduffyMMM

4 months ago

I love robots But sometimes they don't love me... 🥲

0

54

Ben Duffy @benduffyMMM

4 months ago

Humanoids. Built in our own image... THE HUBRIS!!! I love it!

1

0

44

Ben Duffy @benduffyMMM

5 months ago

Software development has changed so rapidly, even over last month... Gonna listen to their book. https://t.co/9COonDEPhL

0

32

Ben Duffy @benduffyMMM

5 months ago

omg, we live in the future, claude is taking control of my browser to add my dishwasher and 15 other items as ads in ebay (kleinanzeigen) and facebook marketplace.

benduffyMMM's tweet photo. omg, we live in the future, claude is taking control of my browser to add my dishwasher and 15 other items as ads in ebay (kleinanzeigen) and facebook marketplace. https://t.co/Bm35mBRji6

0

199

Ben Duffy @benduffyMMM

5 months ago

Missed this one. 2026/2027 is gonna be the year of AI and science combined.

0

34

Ben Duffy @benduffyMMM

5 months ago

Cursor's composer-1 frontier LLM is super fast and accurate highly underated!

0

29

benduffyMMM retweeted

Andrej Karpathy

@karpathy

5 months ago

The first 100% autonomous coast-to-coast drive on Tesla FSD V14.2! 2 days 20 hours, 2732 miles, zero interventions. This one is special because the coast-to-coast drive was a major goal for the autopilot team from the start. A lot of hours were spent in marathon clip review sessions late into the night looking over interventions as we attempted legs of the drive over time - triaging, categorizing, planning out all the projects to close the gap and bring the number of interventions to zero. Amazing to see the system actually get there and huge congrats to the team!

310

14K

980

1K

1M

Ben Duffy @benduffyMMM

6 months ago

@abhitwt Ok, on it

0

1

0

22

benduffyMMM retweeted

Chris Offner @chrisoffner3d

7 months ago

Paper naming conventions are reaching a climax.

4

106

14

34

15K

Ben Duffy @benduffyMMM

7 months ago

@yacineMTB Research is ongoing. Cool.

0

1

0

1

112

Ben Duffy @benduffyMMM

7 months ago

@yacineMTB For RL research full mujoco suite for musculoskeletal motor control: https://t.co/D4uG3CKiXi

4

48

3

32

4K

Ben Duffy @benduffyMMM

7 months ago

https://t.co/frOVYVrbg5

Thomas Miconi @ThomasMiconi

almost 4 years ago

I'd like to look back at the two mega-papers on Minecraft RL that just came out, from @OpenAI and @nvidia. They both rely on diabolically clever ideas... but in completely different directions.

ThomasMiconi's tweet photo. I'd like to look back at the two mega-papers on Minecraft RL that just came out, from @OpenAI and @nvidia.

They both rely on diabolically clever ideas... but in completely different directions. https://t.co/Wy4RknTtMH

5

364

59

125

0

38

Ben Duffy @benduffyMMM

7 months ago

Somehow missed this. Always love Minecraft/open-ended papers! Voyager paper blew my mind, but used code-gen on the Mineflayer API! 2022 pixel-to-action papers below are similar but used fine tuning. But this is with only offline data! I think very relevant for robotics.

benduffyMMM's tweet photo. Somehow missed this. Always love Minecraft/open-ended papers!

Voyager paper blew my mind, but used code-gen on the Mineflayer API! 2022 pixel-to-action papers below are similar but used fine tuning. But this is with only offline data! I think very relevant for robotics. https://t.co/vVZt6LaAVn

1

0

60

Ben Duffy

@benduffyMMM

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users