You drop an AI agent into your real codebase and it produces garbage. Wrong style, subtle bugs, side effects everywhere. The problem isn't the model. Your codebase isn't ready for AI. Think of it like handing off a biathlon relay with no map of the course 🥇🧵👇
@antirez@anemll I wonder how feasible it would be to use it as a base to stream from RAM for non uniform memory systems (to always do inference on GPU)
A few days ago I published Burla. This follow-up is about the part I care more about: the design bets behind it. Toy examples are where mocking libraries look tidy. Real migrations are where they tell the truth. Thread below. 🧵👇
I do not think the AI angle is marketing fluff.
Inconsistent APIs tax humans and LLMs in the same way: more context spent on framework quirks, less energy left for the code under test.
@antirez are your tweaks all Metal specific or could this also be a good starting point for optimising a 128gb system with hybrid CPU/CUDA inference via layer offloading?
I finally published Burla, a .NET mocking library I have been rewriting for months while pressure-testing models, local setups, and agentic coding workflows. I needed a real project, not one more opinion about AI. Built with LLMs in the loop, for the LLM era. Thread below. 🧵👇
The future winner will not just be the model that is slightly better.
It will be the model that gets the job done reliably, quickly, and cheaply enough to become the default.
Full post: https://t.co/wXRq0L6JSF
If you're still defaulting to Claude Opus in GitHub Copilot, you're probably paying a hype tax. 🧵👇
Opus 4.6 = 3x requests.
Opus 4.7 = 7.5x.
GPT-5.4 = 1x, even with xhigh reasoning in VS Code.
I do not see a quality gap remotely close to that.
GPT-5.4 also tends to be faster and less rate limited.
People underestimate what slow loops do to the workflow: fewer iterations, worse focus, and less willingness to explore alternatives.