the value of this technology will mostly not be captured by its inventors, the labs, or even the chipmakers, but rather will be captured by the consumers as surplus. these are highly competitive markets without any natural monopolistic effects
like many other technologies before it, machine intelligence democratizes abilities previously only available to the wealthy, in this case by commoditizing the services of the white collar elite who mostly live in rich countries
it’s not that there are no programmers, it’s that really anybody can make software now now so the “rents” of the “human capital” of knowing how to write JavaScript for example should shrink dramatically
this will reduce the inequality between countries: services that previously required lots of human capital now require chatbot subscriptions at worst, or may even be given away for free
you can receive medical advice worthy of a $1000/hr American specialist doctor likely for free while living under a thatched roof in eg Papua New Guinea somewhere
while I think Americans have plenty of reason to be excited by AI, I would be more excited as someone in a poor country
Software development is undergoing a renaissance in front of our eyes.
If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model.
Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now:
As a first step, by March 31st, we're aiming that:
(1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal.
(2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions.
In order to get there, here's what we recommended to the team a few weeks ago:
1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying.
- Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow.
- Share experiences or questions in a few designated internal channels
- Take a day for a company-wide Codex hackathon
2. Create skills and AGENTS[.md].
- Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task.
- Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository
3. Inventory and make accessible any internal tools.
- Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server).
4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration.
- Write tests which are quick to run, and create high-quality interfaces between components.
5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high
- Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting.
6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use.
Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.
Claim: gpt-5-pro can prove new interesting mathematics.
Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct.
Details below.
We’ve activated our strongest safeguards for ChatGPT Agent.
It’s the first model we’ve classified as High capability in biology & chemistry under our Preparedness Framework. Here’s why that matters–and what we’re doing to keep it safe. 🧵
Exciting breakthrough from @eliasbareinboim -- a counterfactual-calculus (cfl-calculus), akin to do-calculus, for handling identification problems in Layer-3 of the Causal Hierarchy.
@MajmudarAdam and now we can use llms to capture our thoughts in high dimensional spaces and shape them for consumption by others.
Ideas can be personalized and shared at higher bandwidth w/ more clarity. (words → image / video, personalized using receiver history)
https://t.co/wj8Gvmekuv
We used to speak words to evoke a thing deeper within, now we deploy bots who understand that deeper thing and generate infinite manifestations of our thoughts, customized and reaching further.
We used to speak words to evoke a thing deeper within, now we deploy bots who understand that deeper thing and generate infinite manifestations of our thoughts, customized and reaching further.
Consumer companies are going to have to compete with end users empowered by AI customizing and building their own experiences.
Cos like @tillermoney deliver the data and let you build your own dashboards with @marimo_io or do custom analysis with @juliusai
IMO data teams have two internal mandates right now --
1. How can they make their own workflows more effective
2. How can they make their stakeholder's workflows more effective
I have less of a use for text-to-SQL in my own dev work, but it's been helpful for 2. We've one-shotted SQL responses to help-data slack channel questions with text-to-SQL wrapped in context and ability to run and return Snowflake queries.
Which expands the window of self-serve analytics and lets stakeholders ask a few more questions before they get stuck and take a data scientist out of flow state. So we spend more time in Hex!
@kevinakwok@googledocs Imagine a world where they bring in better threading, integration between comments and revision history, some features of Google Wave (rip)… such a massive opportunity to reimagine collaboration
It should be possible to read comments in @googledocs without having to turn on suggestion mode.
To read comments today is to see a bunch of cursors jumping around and highlighting, while trying to be careful to not make a suggestion when you’re simply trying to read.
Agency > Intelligence
I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are we educating for agency? Are you acting as if you had 10X agency?
Grok explanation is ~close:
“Agency, as a personality trait, refers to an individual's capacity to take initiative, make decisions, and exert control over their actions and environment. It’s about being proactive rather than reactive—someone with high agency doesn’t just let life happen to them; they shape it. Think of it as a blend of self-efficacy, determination, and a sense of ownership over one’s path.
People with strong agency tend to set goals and pursue them with confidence, even in the face of obstacles. They’re the type to say, “I’ll figure it out,” and then actually do it. On the flip side, someone low in agency might feel more like a passenger in their own life, waiting for external forces—like luck, other people, or circumstances—to dictate what happens next.
It’s not quite the same as assertiveness or ambition, though it can overlap. Agency is quieter, more internal—it’s the belief that you *can* act, paired with the will to follow through. Psychologists often tie it to concepts like locus of control: high-agency folks lean toward an internal locus, feeling they steer their fate, while low-agency folks might lean external, seeing life as something that happens *to* them.”
Towards the D in ACID, how many DBMSs:
- fsync() on commit
- fsync() on opening the WAL
- daisy chain checksums (cf. misdirected I/O)
- open the WAL with O_DIRECT (cf. fsyncgate)
- have 2 WALs (cf. Protocol-Aware Recovery)
- don't trust the inode to get WAL size
- test this?
Two really nice recent papers from Apple machine learning research:
- Scaling laws for MoEs
- Scaling laws for knowledge distillation
Work by @samira_abnar@danbusbridge et al.
safety nets >> guardrails
safety nets allow you to explore the space freely, make mistakes and learn, guardrails artificially constrain the exploration space and prevent you from learning important lessons firsthand