Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor.
It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx
@grepmoney Yeah, all last night my GPT-5.5 based agent was leaking thinking traces, it's so funny to see the caveman speech coming through between tool calls. And then it switches back to full eloquence in the final response to me, kinda jarring xD
@scaling01 When I tried using MiniMax M3 via OpenCode Zen in an agent, it seemed to be mishandling thinking tags all over the place. Maybe a misconfiguration with the provider, but wouldn't be surprised if that kind of thing caused it to underperform at a bunch of benchmarks.
Feels to me like reliable, fast and cheap Computer Use will be the thing that unlocks LLM agents being able to take over entire roles. Once it can use every application on my machine natively through mouse/keyboard, the only thing it can't do for my job is attend meetings.
the frontier labs don’t have “comms problems”. reality right now has a comms problem. what is happening is a little scary and there’s no nice words anyone could say, especially not those profiting from it, that’ll make it feel that much better
@flowersslop Hey just a quick question about your view on this (which we probably disagree on so that's why I'm curious). Do you think we'll reach ASI? Like, substantially smarter than humanity? Or will we plateau around human level or not too far above?
Good Scenario: AIs too dumb to scheme, then scheme a bit as they get smarter, then stop scheming as alignment improves.
Bad Scenario: AIs too dumb to scheme, then scheme a bit as they get smarter, then seem to stop scheming as they get smart enough to hide their scheming.
😐
@tenobrus Over the last year I replied twice to people with poetry that was appropriate to the situation and which I was rather proud of, and it got completely ignored. Either people didn't see it/didn't care, or people assumed it was AI, or I'm way worse at poetry than I assumed...
@randomtryidk@scaling01 How much of future science will be "I have a hypothesis, I can come up with experiments to get good data for it" and how much of it will be "I happened to have randomly accessed data which gives me some ideas for a new hypothesis"? Feels like LLMs can do the former.
@NintendoFanGirl I assume someone else has already mentioned it somewhere, but Tactics Advance and Tactics A2 exist and are different from Tactics (I'm not necessarily recommending them, but if you're planning to play all FF games...)
@signulll Would you consider an agent 'live' if it's in a continuous loop? Like a neverending Ralph loop? Or is that still just a bunch of pulls in a trenchcoat? You could communicate with it through a file that gets concatenated with your messages and its replies...
@fchollet I imagine I must have learned something genuinely novel to me at least once in my life. Fairly sure there are some things which are beyond me, though. Maybe with a superintelligent teacher?
@GregKamradt Excited to see what effect harnesses have! While grading LLMs purely on input game state/output actions is good for knowing how close to humanlike reasoning/learning pure LLMs are, I'm really glad you're also sharing how well harnesses do.