Today weโre releasing DeepSWE, a new standard for agentic coding benchmarks.
On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
I presented today at Demo day Day 2 and @TechCrunch featured us @datacurve! Just been reading TC and listening to TC Daily Crunch since high school mornings... a surreal feeling to see us on it. Also, post-demo sadness cuz now YC is coming to an end