Are there any tools that consistently get agents to simplify codebases? Have you ever seen a tool cut 20% of code out of a system while maintaining externally-visible behavior? Not dead code etc., but actual design simplification.
I have to push back on 2 things as i think one is categorically incorrect and the other is demonstrably incorrect.
1. Debugging:
Debugging is not a thing if coding is solved. You would produce correct behaviors. I don't understand how a solved problem could produce erroneous behavior.
2. Coding is the easy part:
setting hardware, capacity, talking to users, product planning agreed is in fact hard, but so is coding.
Example: If coding was in fact not hard then Claude Code having a flickering issue for well over 9 months, which is a purely software challenge, would have been solved almost immediately (immediately being on a shortened time scale comparatively to a human solve time scale).
For more trivial applications software approximation can largely work. I also love software approximation for exploring how things should feel.
@TheStalwart The main thing I've learned as a programmer leaning heavily into AI tools is that programming is SO MUCH MORE than just writing code - and while I can get amazing results out of the AI it's taking every inch of my 25+ years of software experience
Thoughts on Bun’s port from Zig to Rust
Background: back on ReactJS I sometime helped refactoring the Facebook monorepo when there’s a breaking React API change. I then led the conversion of Messenger web from JS to ReasonML. Given those experiences, I think there’s some interesting parallel to @jarredsumner's refactor of Bun that the programming community isn’t picking up.
There are refactors of 100 lines that’d take days to verify, then there are refactors of 100k lines that’d take minutes. “Refactor” is a large word and most folks associated it with structural changes (which is indeed what a worthwhile refactor should be and what most folks are exposed to). However, what Bun did is actually to intentionally preserve existing structures through situation-specific, syntactic and semantic invariants. In other words, pretend that Bun whipped up something between a bunch of seds + AST macros and a proper transpiler, then transpiled the codebase.
Such codebase-specific transpiler would treat the tests passing to be not just about the concrete tests themselves, but about them validating representative semantic “corners” of said transpiler (In which case the correctness of such refactor increases up immensely bc tests represent generalized semantics).
Now tackle on the fact that Claude helped accelerating the making of this virtual transpiler from, say, 70 days to 7, then this all adds up. One additional note is that such transformation is more dangerous if it was Rust -> Zig. But here it’s from a less safe lang to a safer one (I like Zig so this isn’t a subjective vote. Anything that makes languages more like Jai is probably good for agentic usages).
Now here’s the exciting point I’m trying to reduce this situation to: the LLM (Mythos?) itself acted as this premade, ad-hoc transpiler, which imo is a much more fun, and much less discussed perspective. There’s a very strong argument (and soon, more empirical evidence) that a specially-posttrained LLM, along with inference determinism and some lower-level traditional compiler helpers, can act as an optimizing compiler one day. Pretty exciting era we’re entering!
I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out.
I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really).
It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely.
The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture.
We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying.
I worry.
It’s 2018 and your coworker just sent you a 400 line pull request.
You get a cup of coffee and sit down to review it.
It’s beautiful. Elegant micro-refactors. Crispy method names.
You catch a few things, but that’s ok. It’s part of the dance. They didn’t consider extensibility on part of their API. Here’s a comment buddy.
They respond in an hour saying they think we should do one piece differently than your comment. Hey let’s jump into a room and figure it out. We can’t just agree to disagree, this code is too important.
The PR merges and goes to prod. You feel a shared sense of ownership and accomplishment.
That night you go to sleep and dream of that code. You can still see the shapes of it on the backs of your eyelids, your IDE syntax highlighting sparking neurons in your reptile brain.
You go to work the next day ready to go. You understand the system. N is your foundation. Time to build n+1.
Ashlee goes through the trouble of using Signal so conversations stay confidential and don’t end up as courtroom exhibits. This guy screenshots it and sends it over text.
Now I’m also onboard he shouldn’t be in charge of AGI…
There is a surge of supply chain attacks (and it is only going to get worse)
If you are using pnpm, take these steps to protect yourself:
* set minimumReleaseAge to 7 days
* set blockExoticSubdeps to true
* configure onlyBuiltDependencies
npm / yarn have similar settings
@rfleury@jarredsumner >>See Ryan Fleury reply on accout with "Delete your account"
>> I proceed to mute the person Ryan replied to.
My feed is getting better and better.
I'm now able to tell my agent “we are going to work on JIRA-1234” and it goes and pulls down the task, makes me a plan, I say yeah okay that looks good, and it generates the commit.
I run an AI review from a different session, it finds 4 issues of varying priorities, I paste it to my original agent and say validate these findings and fix them if necessary, it creates a fix, I run another review, no more high priority issues found. I open up the code in an IDE to go over it before pushing it up for human review. Looks fine I guess, nothing crazy. I try to understand everything before I push it up for review because if this breaks, it's still my name on it. I say why did you make this one change, it gives me a reasonable explanation for why.
It says something codebaity like "if you want I can suggest 2 more ways you could really tighten up this work to prevent some rare but possible regressions". I'm smart enough to not fall for it.
Code pushed up, task moved to in-review. I didn't write any of it, this is not my accomplishment. Users won't care who wrote it if it works. A lot done in 20 mins but it felt soulless.
OAuth scams are becoming more and more popular
this time a malicious @nylas OAuth client is being used to attack one of our employees (who immediately understood its a scam and has reported it)
the playbook is simple:
1. obtain a Google OAuth client
2. pretend to send a .docx
3. immediately open the Google OAuth splash screen
4. Attacker gains full access to Email, Calendar and more
officially it falls under phishing, however attackers no longer make fake login forms but instead show official google log in screens
most people fall for it because they check the website, see google.com AND nylas.com (which is also legit) and think its safe.
signing into google to view a document is also very common
anthropic being down, rate limits, etc is understandable. huge hardware reqs and insane growth.
claude code being bug filled is just a result of people vibe coding slop at anthropic and accepting it as ok. it is not ok you will lose your customers just as fast as you've gained them.
i feel like a lot of people i know aren’t as tapped in on ai setups as they should be. lots of butt sniffing, curious pokes. i’m gonna tell you exactly how i have this shit set up, and it may be dumb, but it’s a set up:
We’re making a fundamental tradeoff when we use these coding agents, but some of my engineering friends still don’t seem to get it.
I see a lot of engineers make the argument “AI outputs still aren’t that great, code quality isn’t the same as writing it by hand.” And it’s all true, their argument is not wrong.
But a coding agent can produce in a few hours what would take one of these engineers a month, or more. And I’ll even accept at face value that their well-thought-out, carefully written “artisanal code” would be better. But it would take them 10x longer to get out.
So the fundamental thing we’re doing is to take this average code, generated in a day by an LLM, and try to make it a little above average. We are prompting our way onto something we can stand behind and support.
And even if that extra work takes let’s say a week & it’s frustrating as hell to get it to something we can stand behind, a week is still better than spending a month or more. So that’s the tradeoff.
And if you write code for money, pretty soon, nobody is going to let you take a month to write it carefully.
Businesses all exist in a competitive space, if other businesses find a way to move faster by willingly making this tradeoff, they’ll all be forced to work this way. And it’s already happening and it will continue to happen.
Then on top of all this, I notice many of us have this old mental model that we are writing code for other humans to extend, to work with, to add features to, etc. And in that world code quality matters a lot.
But it’s unclear to me now that other humans will ever extend your code again. If these models improve even a little bit more, models will always be extending your code from now on.
In that new world, you have to almost always make this tradeoff and you have to become great at it.
One reason vibe coding is so addictive is that you are always *almost* there but not 100% there. The agent implements an amazing feature and got maybe 10% of the thing wrong, and you are like "hey I can fix this if i just prompt it for 5 more mins"
And that was 5 hrs ago