AI models are getting more expensive and worse at the same time
Claude Opus 4.6 - hits limits in 60 mins
Claude Opus 4.7 - hits limits in 15 mins
Claude Opus 4.8 - hits limits in 5 mins
Same story with Codex
Prices keep rising, they remove old models like GPT-5.3 and force you onto expensive new ones that burn through limits instantly
at this rate AI will soon cost more than hiring a employee
or will people just switch to local models instead?
🦀 The Rust frontend is officially merged into vLLM!
As GPUs get faster, the frontend has become a real share of CPU time. The new Rust frontend is a drop-in alternative to the Python API server — same engine, same ZMQ boundary. Opt in with VLLM_USE_RUST_FRONTEND=1.
Early numbers: on a preprocess-heavy workload, ~837 req/s vs ~162 req/s for default Python — ~5x in a single process.
A few design choices we're excited about:
• Layered crates with clear boundaries
• Stream-native pipeline — non-streaming for free
• Builds on stable Rust
Huge thanks to @BugenZhao from @inferact for introducing the work at @PyTorch Meetup Singapore.
https://t.co/Tw8PoIjbH9
AI sucks at compiler development not least because humans do too.
The set of people who can design a programming language soundly and people who can contribute to an industry-grade compiler is ridiculously small.
So you get either a sound theory or a high-quality implementation, but rarely both, if ever.
And then LLMs go train on this dogshit plus toy compiler implementations plus book and courses which are just textual versions of toy implementations.
In the end all you get is unscalable compiler passes that trigger bugs the moment you attempt to use two uncommon features of your compiler together.
@VictorTaelin One of the things we need to improve significantly is avoiding gatekeeping around obvious things and ensuring we stay focused on what truly matters: key insights and important decisions. That’s where we, as developers, should remain actively involved in the loop.
@VictorTaelin That said, I believe that, we need to improve our ability to handle multiple contexts and manage several work-in-progress items at once. AI-assisted coding is still too slow, so to maximize the value these tools bring, we should focus on building many more things in parallel.
Status update: I've been on/off AI agents in the last few days and it is a verifiable truth that every day I didn't use agents, I was more productive. I still attribute that to how slow they are, and my own inability to multi-task efficiently. The magic is there but the slowness doesn't let it cross the threshold where they actually make me faster, and I still dislike the whole thinking paradigm.
About Bend2: honestly, the C/Metal compiler codebase is a clusterfuck right now. I regret letting AI agents write it. All tests pass, and GPU performance is mind-blowing, so the core architecture works. Yet, it has a LOT of bugs. Anything not covered by the tests is a coin toss. This is actually impressive, because, in many parts of the codebase, the right solution was actually the simplest one, yet, the agents STILL managed to find a way to make it work just for the tests. The level of reward hack these agents output is actually impressive I can't even be mad.
It is also ironical because that's the very problem that Bend's proof system was supposed to solve, but Bend is in TypeScript, not in Bend. I'm disappointed I didn't write Bend in itself, and now I feel an immense urge to do so. But the clock is ticking . . .
Still, I do not think Bend is worth launching without the GPU compiler being solid, because the closest competitor, Lean, is actually extremely good, so we need a big differential. Yet, due to the very nature of the project, it would be embarrassing to have bugs at launch.
Regarding AI, I now believe using current gen AI agents in production codebase is harmful and a massive mistake. That doesn't mean no agents at all, but agents work best when they don't touch critical code. Debugging, researching, providing insights, scripts / tools, or anything that doesn't touch code you will maintain in the long term. But if you merge AI code without reading, you're going to have a bad time. Speaking from experience
I'm working 10h/day on SupGen and the remaining time on Bend2
@0xCVYH A questão é quem julga isso, porq se cada ação precisar de um humano no loop, daí complica. O agente não percebe o efeito porque lhe falta contexto. E mesmo que ele tenha o contexto, nada impede um erro de julgamento.