Many developers have suspected for months that GPT-5.5 outperforms Claude Sonnet for coding. But SWE-Bench reported near-parity, and it made people question what they’d been seeing in practice.
DeepSWE aligns more closely with that day-to-day experience: GPT-5.5 scores 70% versus Claude Sonnet at 32%. That difference is substantial.
DeepSWE focuses on what tends to matter in real workflows: whether an agent can take a short behavioral prompt, locate the correct area of the codebase, and implement the change cleanly - without needing you to enumerate files, modules, and functions. SWE-Bench often fails to capture that, due to dataset contamination and weaker verification.
https://t.co/C3s80xfDkk
@meabed Bro, what Anthropic did is bullshit. I’ve been using claude -p to automate my harness tests, and now there’s suddenly a $200 cap on tool usage before they start billing separately through the API. That’s insane.
They are the new APPLE but for AI.
Introducing Daybreak: frontier AI for cyber defenders.
Daybreak brings together the most capable OpenAI models, Codex, and our security partners to accelerate cyber defense and continuously secure software.
A step toward a future where security teams can move at the speed defense demands.
this is cursor team kit: a plugin for some skills we use to build cursor at cursor
skills for verifying changes, driving local tools, and shipping reviewable PRs
https://t.co/8R4XNCUOfe
@trashh_dev I have been demoted from 'Father' to 'Automated Mining Unit in Minecraft.' My 6-year-old little dictator makes me find ores all playtime long, while she decorates the house and adopts puppies.
StarCraft would be heaven in comparison!
@theo@BoganBits I’ve stopped caring because people say it’s over every minute. They said that about GPT-2 and we’re still fine. I don't really trust what a CEO says for obvious reasons. Even if it is true and causes problems I think we will still be okay.
I love your content btw =)
james has achieved distributed opencode
agents can run on your laptop, on a remote server, in a cloud sandbox provider
shut your laptop and things keep running
open it back up and all the data syncs
delete the sandbox nothing is lost