BREAKING: Alibaba tested 18 AI coding agents on 100 real codebases, spanning 233 days each. they failed spectacularly.
turns out passing tests once is easy. maintaining code for 8 months without breaking everything is where AI completely collapses.
SWE-CI is the first benchmark that measures long-term code maintenance instead of one-shot bug fixes. each task tracks 71 consecutive commits of real evolution.
75% of models break previously working code during maintenance. only Claude Opus 4.5 and 4.6 stay above 50% zero-regression rate. every other model accumulates technical debt that compounds with every single iteration.
here's the brutal part:
- HumanEval and SWE-bench measure "does it work right now"
- SWE-CI measures "does it still work after 8 months of changes"
agents optimized for snapshot testing write brittle code that passes tests today but becomes completely unmaintainable tomorrow.
they built EvoScore to weight later iterations heavier than early ones. agents that sacrifice code quality for quick wins get punished when the consequences compound.
the AI coding narrative just got more honest.
most models can write code. almost none can maintain it.
@Decathlon decathlon india has built one of the shittiest online experience of all. Their payment flow is a mess. App team might be on holidays.Overall, its a 3rd class app with no customer empathy. Get you shit together for god sake!!!
There is a learning from itr portal on how not to build a public facing site.
Rather building site that handles peak traffic, support has chosen cheap tricks like clear browser cache and delete temp files. The site has too many issues. God bless tax payers! @IncomeTaxIndia
@Kuvera_In Please test properly before pushing changes to production. It seems all the orders placed today are blocked with pathetic nominee update error. Way to go! Customers had to discover and tell you this.
@AirIndiaX you should be ashamed of yourself. IX1983 was supposed to departure from Bengaluru at 3.40pm. Later delayed by 30min and changed departure time to 4.10p. it's 5.18p now, flight is yet to start departure. This definitely resonates with shitty tata brand.
@Chaayos@BlueDart_@ShiprocketIndia these are the 3 worst, pathetic and 3rd class companies not qualified to be in online business. No business ethics. No customer first thinking. Useless customer support. Do not use their services at any cost.
@Chaayos you provide one of the unreliable and pathetic delivery services. Supposed to deliver by 4th. It is already delayed by 3 days and still no sign of delivery any time sooner. Why are you even in online business?
@IncomeTaxIndia your website https://t.co/UNtwspY1xH is quite unreliable.
How many times do I have to enter the same values and save? It keeps losing the saved data. It is pathetic and very annoying. Please make sure to verify that the website is reliable and durable.
@Lifestyle_Store scoundrels at the lifestyle are neither delivering the product nor refunding the money. No.1 a**h****s.
Its been 10 days. No update from this 3rd class company and incompletent customer care.
****DO NOT ORDER ANYTHING FROM THIS CRAP WEBSITE****
#BoycottLifestyle
@Lifestyle_Store lifestyle online order and delivery service is 3rd grade service with zero accountability. Pathetic customer service. ******DO NOT BUY FROM THIS SHIT STORE*****
#BoycottLifestyleStores
In order to apply most of your energy in one direction, you have to say no to things that a lot of other people say yes to.
Most successful people are masters at eliminating the unnecessary from their lives.
Profits for corporates, taxes for individuals.
The income tax system, which needs to be fair before anything else, now favours corporations over individuals, as collections of income tax paid by corporations has fallen over the years, whereas the personal income tax collections have gone up. This despite corporates making significantly higher profits than in the past.
Higher personal income tax collections have made up for a fall in income tax paid by corporations and a fall in disinvestment receipts.
Or why the government is not in a position to help revive private consumption, which in 2023-24 is expected to be at a 21-year low, except for the pandemic year of 2020-21.
So, measures like a cut in surcharges and personal income tax rates haven't happened. But more than that a cut in goods and services tax rate on items like two-wheelers to washing-machines, haven't happened either. Or a cut in the high taxes on petrol and diesel, for that matter.
My Outside the Eco-Chamber column in the Deccan Herald.
https://t.co/ic2GFIb3u5… https://t.co/uTsmvWuw9L via @deccanherald
@kaul_vivek I hope our country's decision makers had time to read articles like this to realise how the budget announcements has been negatively impacting the individuals prosperity in the recent years.