Traditional coding benchmarks do not reflect how software is actually built and maintained.
That's why we built a new benchmark, APEX-SWE, in partnership with @cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't.
@OpenAI GPT 5.3 Codex (High) tops the leaderboard at 41.5% on Pass@1.
Is Twitter down? Can't log in to a new session, even in an incognito window. Just seeing "Nothing to see here
Looks like this page doesn’t exist. Here’s a picture of a poodle sitting in a chair for your trouble."
@TMobile - can't get into my account, email verification code isn't being sent. Confirmed that email address works, checked logs. Possibly related to ongoing AWS outage in US-East-1?
The @MLS pass experience on @AppleTV leaves a lot to be desired. In particular, the TV app on my Mac consistently shows old games as being "Live" ... making it really hard to find the current games
@paramountplus why is the LAFC game buffering every 15 seconds on my Google TV? I missed every goal so far, and when it starts playing again the audio is delayed by about the same amount of time
It would be really nice if @AppleTV supported the normal keyboard shortcuts we're used to on streaming services... F for fullscreen, and M for toggling mute. Just an idea 🙏
1/17. 🔪Let’s slay some protein myths once and for all!
I’ve spent 10 years studying cancer and protein structure & function. The US says I have “exceptional ability” in my field & I’ve been (am!) a protein and process development scientist in biotech firms & startups.
A 🧵👇🏼
Why does Apple insist on making everything huge on their screens? They use these beautiful high-resolution displays but give me no screen real estate because I can't scale the resolution down to a useful size without 3rd-party tools on macOS, even worse on iPad.
DoorDash order is taking longer than expected... I don't know if I'm more mad at DoorDash for underestimating the delivery time, the Restaurant for being slow, or myself for not picking it up on my own... Mostly the latter 🤦
Just bought a new iPad Pro 11” with Magic Keyboard. Some apps don’t respect the system wide Font Size setting so everything in Slack and Chrome are infuriatingly large. Also the @SlackHQ app does not work well with the Magic Keyboard, can’t navigate with arrow keys, can’t search!