A fundamental problem with extending Codex/Cowork/Code to all knowledge work is that they remain very "software-brained" where the end result (the software) is what is important & that code serves as a source of truth.
For a lot of other knowledge work, the process is at least as important as the outcome. This includes researching what is known, an exploration of alternatives, failed efforts, prototype branches, experiments, etc. All of those things are valuable, so you cannot use the PowerPoint at the end the way you can use a codebase, nor is progress on a to-do list sufficient context post compaction. You work in learning loops, refining your perspectives as you go.
In some ways, this makes long-running models like Fable hard to use for deep knowledge work, since they are designed to deliver product to you in the end. You can prompt your way around this problem, but everything about the Codex and Code harnesses want you to be a software developer and you have to fight them. There is a real disconnect between how a manager or analyst thinks about problems and how the agentic software tools approach solving them. Addressing this is critical to breaking out of the coding niche for these tools.
One of the most understated ways LLMs have helped my work over the past couple of years is forcing me to get better at articulating my goals unambiguously. As harnesses get more autonomous, this skill compounds very well.
I don’t engage with the syntax. Lately I’ve been having the agents use languages that I don’t know well. That helps me disengage from the syntax.
On the other hand I engage strongly with the structure. Massively overloaded tests. Small discrete functions. Well partitioned modules. Tightly controlled dependencies. I watch all that like a hawk.
Syntax doesn’t matter to me. Structure does.
@GergelyOrosz@LocalManOpines There were multiple instances and swes were eligible in most: Google offers 'voluntary exit' to all US platforms and devices employees | Hacker News https://t.co/Wl6P8QwBdW
🚨 Gah, someone just leaked the @geminicli source code!
Nah, we've always been open. Try it out, submit a PR, track our weekly releases, and make it your own.
https://t.co/jZ04vWVmH3
To quote from my keynote at Vercel's internal offsite:
Software is free as in puppies. It will pee in your bedroom and eat your furniture.
The weight of every line of code is real. We will need to maintain it. We will need to port it. It goes into the context window. And somebody in this room will get paged at 2am because it did something unexpected
The bottleneck has so quickly moved from code generation to code review that it is actually a bit jarring.
None of the current systems / norms are setup for this world yet.
Amazon had four Sev-1 outages (their highest severity level) in a single week. Internal memos say AI-assisted code changes were a contributing factor.
The timeline here is wild. In October 2025, Amazon laid off 14,000 corporate employees. In January 2026, another 16,000. That’s about 30,000 people in five months, roughly 10% of the corporate workforce. CEO Andy Jassy said the cuts were about culture, not AI.
During those same months, Amazon set a target: 80% of developers using AI coding tools at least once a week. They tracked adoption closely and blocked rival tools like OpenAI’s Codex. Even so, 30% of developers still hadn’t touched Amazon’s in-house tool Kiro by January.
In December 2025, Kiro caused a 13-hour AWS outage. The AI tool had production-level permissions and decided the best fix for a bug was to delete and recreate an entire live environment. A second incident involved Amazon Q Developer, another AI tool. Amazon blamed both on “user error, not AI.” But quietly added mandatory peer review for all production access afterward.
Then March 5: Amazon’s retail site went down for about six hours. Over 22,000 users reported checkout failures, missing prices, and app crashes. Amazon called it a “software code deployment” error.
Five days later, SVP Dave Treadwell made the normally optional weekly engineering meeting mandatory. His memo acknowledged “GenAI tools supplementing or accelerating production change instructions, leading to unsafe practices.” These problems trace back to Q3 2025. Amazon’s own assessment: their GenAI safeguards “are not yet fully established.”
The new rule: junior and mid-level engineers now need senior sign-off on any AI-assisted production changes. Treadwell also announced “controlled friction” for the most critical parts of the retail experience.
For context, Google’s 2025 DORA report found 90% of developers use AI for coding but only 24% trust it “a lot.” An Uplevel study of 800 developers found Copilot users introduced 41% more bugs with no improvement in output. Amazon is finding out what those numbers look like at the scale of a $500 Billion revenue company, with 30,000 fewer people on staff to catch the mistakes.
The concept of the Overton Window is far more generalizable than we think. It applies from international relations to evolving norms in software.
For those unfamiliar: the Overton Window defines the range of acceptable political discussion at a given time. Prominent actors can expand, shrink, or shift the window, effectively updating the bounds of normal behavior in a system.
With Venezuela and Iran, we can now say there has been a rapid shift in international relations norms, particularly how heads of state are treated.
I believe two main factors drive this:
1. Superpowers are far more capable and can act with lower risk.
2. Once a first action succeeds, it sets a precedent. Each repetition increases the likelihood of similar future actions, creating a reinforcing causal loop.
This shift is also happening in software.
A few years ago, an engineer might have viewed a clearly AI generated pull request as unacceptable or lazy. Now we are slowly moving toward a world where non AI generated code is sometimes jokingly called “artisanal.”
Recent reporting on AI use in US operations puts us in an interesting position: international relations and software are rapidly influencing each other.
AI expands the capabilities of international actors. The more it is used, the more feedback it receives, improving its usefulness in turn.
I don’t think the massive shifts we’ve seen in both domains are coincidental.
I have been using LLMs heavily this past week to learn new things. I wish it was available to me when I was in college but I wonder if the temptation "get stuff done fast" would have hindered my learning. When used correctly, it is the best TA you will ever have.
The future of software engineering is SRE
Because when code gets cheap, operational excellence wins. Anyone can build a greenfield demo, but you need engineering to run a service.
https://t.co/KkQvDBNdU7
DeepSeek was a side project at High-Flyer Quant.
Qwen was a side project at Alibaba.
Twitter was a side project at Odeo.
Mac was a side project at Apple.
Meanwhile:
Windows Phone was a core project at Microsoft.
Metaverse was a core project at Facebook.
Google Glass was a core project at Google.
Apple Intelligence is a core project at Apple.
taste, passion, agency > roadmaps.
Why are LLM responses (specifically GPT 5.2) so redundant? It seems to provide the answer to you in at least 3 different ways. Almost like there is a minimum response token size configured somewhere🤔