Greg P

@pellyg

Engineering

Oakland, CA

Joined January 2008

1.2K Following

800 Followers

135 Posts

pellyg retweeted

adarsh

@adarsh_exe

3 months ago

Traditional coding benchmarks do not reflect how software is actually built and maintained. That's why we built a new benchmark, APEX-SWE, in partnership with @cognition. It measures whether AI models can perform complex, real-world software engineering work to ship systems that work and debug them when they don't. @OpenAI GPT 5.3 Codex (High) tops the leaderboard at 41.5% on Pass@1.