@thsottiaux It would be really great to have a time-estimate feature for goals. If you ask the model, it always gets it wrong, but you probably have enough data now to build it without relying on the model completely.
The problem with current benchmarks is the same problem we have with enterprise software today: We do not have the infrastructure or environments to leverage greater intelligence.
Datacurve is probably one of the best data provider for coding and DeepSWE is goated. Yes.
But sorry that 3% difference is just not catching the qualitatively different workflows Fable fables; your favorite pros all have experienced it.
So alas we need even better benchmarks
Cursor could well make an imporbable comeback by... offering the best bang-for-buck for coding models!
Charging 1/20th the price (Composer 2.5) vs Opus 4.7, with similar coding characteristics.
I expect Cursor to win back a lot of market share thanks to this.
This post doesn’t mention the words “enterprise”, “business”, or “company” even a single time and yet it is the most poignant piece on enterprise adoption of AI that I have read this year.