Cool to see findings from my masters research - supervisory engineering work - being explored further by Thoughtworks.
Paper from the original study here - https://t.co/HfEM5rTmim
AI can generate code in seconds. The challenge is knowing whether it solved the right problem.
Richard Gall explores “supervisory engineering” the emerging discipline of directing, evaluating and governing AI-generated code.
👉 https://t.co/Wb7c6fAT5r
Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. https://t.co/2zX5bHdhsa
I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem.
As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)!
I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work.
It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results?
88ms => 1.5ms
150K allocs => ~500 allocs
Incredible right? Nope.
My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path.
This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput.
The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity.
Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.
Take whatever number of people you thought might be in jobs related to AI deployment in the enterprise and multiply it by 10. Then probably 10 again.
A major topic that keeps coming up in talking to CIOs across enterprises of all sizes and industries is the implementation gap for getting agents to work at scale and organizations on mission critical work.
As the task goes from implementing a chat system that’s basically an LLM plus search, to connecting to real production systems that both can deliver meaningfully better productivity gains but also introduces meaningfully more risk, a whole new set of work has to be done.
You have to ensure the right level of protection of data, updates to access control controls, migration of legacy systems to common modern platforms, create observability across what agents are doing, implement new workflows, figure out the human in the loop moments, drive the change management of the new workflows, and more.
Then, all of a sudden the model capabilities get updated and you have to do a set of the above steps over again. Half of what you’ve done is obsolete, and the other half needs to be upgraded to take advantage of new capabilities. Or, token budgets run hot and you have to peel off some of the workloads to lower cost models that will be more cost effective. But then you have to go through those same steps.
Enterprise are trying to figure out what is the right set of roles to go and implement the systems in their organization to ensure that the workflows are actually being executed properly, ensure it’s not just slop being produced, and to make sure their organization remains safe and secure.
Many companies are starting by repositioning existing IT talent in these functions, but there’s also a growing need for the equivalent of internal FDEs to go take on these tasks in an enterprise. The looks incrementally closer to software engineering than it does traditional IT implementation.
Next, almost all AI vendors (labs and the software players) will have some form of next-gen FDE or Applied AI architecture functions to help support these use-cases. The benefit here will be these companies have an incentive to make their capabilities work well so they can bring best practices from a range of customers they’re seeing and directly from the product innovation.
And finally, we’re seeing the rise of all new AI services firms or major parts of existing services firms move into AI implementation. Companies will often want to bring in ostensibly neutral players that can work across their tech stack but also have seen best practices across their vertical. There are going to be tons of new service providers that get launched to do this, and many will eventually go and disrupt (or get acquired) by the larger player.
Either way, all told, we’re in for years of AI diffusion, and along with it tons of new roles and areas of work to be done to deploy AI at scale.
Are you into safety ∩ AI coding (if not, you should ;-) and you are itching to try out all the cool new features in Claude Code, are you curious about Google's fabulous antigravity, or convinced open models are the way to go?
Then participate in the https://t.co/JFNmn8L2Ar.
Top teams from this hackathon are invited to apply to the https://t.co/UlCxFsDOmM (June-October 2026).
The projects and mentors look amazing! https://t.co/4JQFA1cb5y
A mental model for working with coding agents is that they're blind squirrels running into a maze and bumping into walls. You must place the walls (verifiable constraints) strategically so that they end up in the general region you want them in.
Brendan Hopper, Matt Beane and I have a thesis, one that I've been sharing around lately, and we want CEOs and boards to hear it.
Before I get to the thesis, let's revisit Clayton Christensen's Innovator's Dilemma (ID), the theory he developed at HBS to explain why big companies often get eaten by upstarts during technology shifts.
In short, the ID says incumbents serve their best customers so well, and tune themselves so ruthlessly for doing exactly what they do today, that they can't chase the disruptor tech coming up from below until it's too late.
The classic solution to the Innovator's Dilemma is to create a "bubble" in your company. You carve out an innovation team with a budget and mandate, as unfettered as practical by the parent organization. This is to combat the 2-level trap presented by the dilemma.
The economic trap is Christensen's original point: a disruptive technology can't justify itself under your existing P&L, because it serves smaller or weirder customers at margins your real business would never accept.
The governance trap is what gets piled on top once you're big: SOC2, FedRAMP, etc. mean every new idea has to clear a lot of process before it can move. The bubble is intended to escape both at once, with its own economics and permission slips.
The standard innovation "bubble" solution famously doesn't work very well. You may solve the problem inside your bubble, but you often can't roll it out to the rest of your company for the original reasons. Everyone is focused on doing their current stuff, and nobody has time for a major change.
Our thesis is that there is an entirely different way out of the dilemma this time around. No bubble needed, as long as you follow a simple rule. That rule is, let your people play. Give them back any time they earn from automating their jobs with AI. Then incentivize them to use that time to improve the company's processes.
When you see an engineering team announce a 40% productivity boost from adopting AI — a number that's been showing up in plenty of LinkedIn posts lately — your first reaction as a CEO or manager is probably to say, that's awesome, we can do more work now! Or you might simply expect to see 40% more output from the team.
Either way, you have just asked them to spend their extra time building faster horses (your current business) instead of letting them go figure out what a car would look like for your company. They gained some productivity from AI, which could have been your ticket out of the Dilemma, and you immediately slurped it back for your existing business.
This will get your company killed in the medium to long haul, because your company tomorrow will look almost nothing like it does today. Conway's Law says your software and your org chart mirror each other; as AI rewrites how you build software, the org has to shift to match. But if you're stealing the hours back saved by your employees, then you're not letting your org pivot naturally in the direction it needs to shift.
@RealGeneKim and I saw this in person at @arkanalabs a few weeks back. As long as your people know they'll be recognized and rewarded if they improve the company's processes — public credit for cross-team workflow wins, promotion criteria that actually count process improvements, managers who treat freed-up hours as a feature rather than a budget line — then they will use their "play time" to seek out other teams, and start pivoting you to becoming AI-native. This way it can unfold in whatever bespoke way is most natural to your company, rather than in some ivory-tower research bubble. For every company, the way it unfolds will be a bit different.
I think of this approach, of giving the time back to the humans who automate parts of their jobs with AI, as the new solution to the Innovator's Dilemma. The old bubble solution was to separate a bunch of people from their regular jobs, and try to give them the freedom to solve the problem in isolation.
In contrast, by giving your regular employees their hours back, the innovation bubble is still there, but it's now dispersed across the company, as lots of very tiny bubbles: one bubble per person who has liberated some hours.
If you've ever read Slack by DeMarco and Lister, a great book from back in the 90s, then our thesis should resonate. What companies need is to empower their own employees, the ones who actually work together (even across departments)--the ones who know how the business works--to shift the company in the new directions together. Gradually, but with intentionality.
You still have the frankly awful problem of token budgets. For every employee you upskill into baseline AI literacy (which I'd define loosely as using coding agents throughout the workday), you've added a non-trivial opex spend — for the heaviest agentic users it can run into five figures a year. I won't sugar-coat it; you need to find that money somehow. I don't have a magic solution, but I'm very happy that other models are catching up to Claude, because they're becoming good enough for real work now.
But token budgets alone aren't enough. To live through the Innovator's Dilemma this time around, your employees need a time budget, too. Give it to the ones who earn it using AI, then incentivize them properly, and I think you're headed in roughly the right direction.
Thank you for coming to my TED tweet.
https://t.co/Pej8fwcevm
For a long time now, I have been pitching that we need a new programming language that is design specifically for AIs, and not for human ergonomics and binary computers. So far, I have been consistently laughed out of the room.
What I have implemented might be a bit different from what you have in mind;
- I do think we need a real programming language to interface between humans and AIs (not markdown).
- The language should be amendable for formal verification (and *automatic* theorem proving) by trusted, non-AI checkers.
- The language be mechanically sympathetic to how models compute, or as I say, should be the mentalese or language of thought for them.
- Programs in this language orchestrate external tools, basically they are fancy scripts.
- Computations at the leafs is still done on traditional computers (no need to teach models to do arithmetic), and we do not really need new languages to specify those.
🚨 We recently discovered that an unauthorized party obtained a token with access to the Grafana Labs GitHub environment, enabling the threat actor to download our codebase. (1/6)
@mitchellh This goes hand in hand with the argument that software is now “disposable”. Sure, prototypes, little side projects, heck, even startup ideas. But disposable software at the enterprise level is highly dependent on agents not just fixing but fully rewriting at break neck speeds.
Next.js just got its worst vulnerability ever, CVSS 8.6.
→ affects versions 13.4.13+, 14.x, 15.x, and 16.0.0–16.2.4
→ attackers can access your internal services, cloud credentials, API keys, and admin panels
→ no authentication needed
→ one crafted request is all it takes
→ roughly 79,000 instances are exploitable right now
→ vercel-hosted apps are safe, self-hosted are not
upgrade to 15.5.16 or 16.2.5 immediately.
🚨 UPDATE: 19 MILLION exposed NGINX instances hit by the 18-year-old NGINX RCE found by AI.
Top exposure by country:
- United States: 5,340,011
- China: 2,540,008
- Germany: 1,871,780
Note on ASLR as added security: not all of these instances will have ASLR disabled, but every one of them is running a version inside the vulnerable band.
The vulnerability is a heap buffer overflow. ASLR randomizes memory layout, which makes reliable RCE much harder because the attacker cannot predict where their payload or useful gadgets land. But the overflow itself still happens. The corrupted memory still causes the NGINX worker process to crash.
ASLR-enabled hosts are still trivially DoS-able. ASLR-disabled or non-PIE builds are RCE-able. Either way, patch ASAP!
made a cat that's obsessed with your cursor
i can't stop playing with this.. i'm a big fan of putting cats on everything everywhere
what should i replace the cursor with? butterflies? some treats?
made with gemini & flow