How OpenAI Built Its Data Agent
Most teams building data agents stack routers, fine-tunes, and complex retrieval pipelines on top of multiple LLMs. OpenAI didn't.
Their data agent runs on a single model and only 13 tools, across 1.5 exabytes and 90,000 tables. It's "pretty vanilla" by design.
We spoke with Emma Tang, Head of Data Platform Engineering at OpenAI, to better understand the architecture and the engineering decisions behind it.
The article covers:
- The architecture behind the data agent
- The six layers of context that make a single LLM reliable across 90,000 tables
- How OpenAI Uses Codex Internally: 3 Use Cases
- Five practical lessons for any team building a domain agent
- Where OpenAI's data platform is headed next
To bring Codex to Windows, we had to answer a hard question: how do you let coding agents stay useful without forcing developers to choose between constant approval prompts and full machine access?
Here’s how we built the Windows sandbox for Codex:
https://t.co/U8JfOe3WIG
Multipath Reliable Connection (MRC): a new open networking protocol for large AI training clusters, deployed in production on our largest training clusters.
There is a lot of news about compute being the bottleneck for AI. There is less visibility into the engineering it takes to make large-scale compute actually work reliably.
In my view, this is one of the most interesting computer science problems in the industry right now. It is not just about getting more GPUs. It is about making every layer of the system work: networking, scheduling, hardware health, storage, orchestration, reliability, observability, security, and the developer experience for researchers.
This blog gives a rare preview into the depth of engineering happening across the stack at OpenAI, starting with MRC and supercomputer networking. We're excited to start sharing more about designing, building, and operating compute at planet scale.
https://t.co/eQVylBAGa3
Join us: https://t.co/BwHYeXnfMo
We quietly launched the Symphony repo on Github last month, and it's already accumulated 15.5k stars!
Excited to share this post that dives into it even deeper: a library that lets you use Codex to orchestrate work normally done by teams of engineers.
https://t.co/aoqUvfrWS2
Really excellent work by the inference team to serve this model so efficiently!
To a significant degree, we have to become an AI inference company now.
Very excited to launch workspace agents in ChatGPT today. Teams can now create shared agents powered by Codex that handle complex tasks and long-running workflows.
We have always wanted to build a product that can go beyond helping individuals be more productive to also helping teams be more effective.
Workspace agents are designed for that - they can gather context from the right systems, follow team processes, ask for human approval when needed, and keep work moving across tools/teams.
Available in research preview in ChatGPT Business, Enterprise, Edu, and Teachers plans.
Huge congrats @tarstarr@christinaahuang@_rohanmehta and the workspace agents team!
https://t.co/2AlqtE49ez
Builders Unscripted with @ashebytes is out!
Ashe has built across a wild range of worlds: solar cars, NASA, and AI products.
What makes her special is how she connects frontier tech, community, and creativity.
She gave me a tour of how she uses Codex!
Three million people are now using Codex weekly - up from two million a little under a month ago. Incredible to see the growth. Thank you to all of you and to the ecosystem we’re part of. To celebrate, we’re resetting rate limits so you can keep building, and we’ll reset them every additional 1M users until we reach 10M, so we can keep celebrating along the way.
Enjoy and thank you!
Really enjoyed joining @petergyang with @embirico to talk about how we’ve been building Codex at OpenAI.
We show a live demo of the Codex app and go behind the scenes.
What’s been striking: role lines are blurring. Designers write code, engineers think product.
Our incredible comms leader, @lindsmccallum , planned a closed door dinner for the Codex team in a fraction of the time it would normally take - thanks to Codex.
She used the Codex App to:
- compile the invite list
- send out invitations
- hourly scan of her emails to update RSVP status
- populate a doc with bios on every attendee
- create a mini app to plan the seating chart
Things Codex didn’t do:
- make the sushi we ate (soon)
Lindsay has never coded before. With Codex, she is a builder.
Codex is the interface for personalized software.
📣 Technical lessons from building computer access for agents
Making long-running workflows practical required tightening the execution loop, providing rich context via file systems, and enabling network access with security guardrails.
Here's how we equipped the Responses API with a computer environment:
https://t.co/dMIlJN2iqA
Michael Bolin (@bolinfest) is the tech lead of the Codex open source repository at OpenAI and formerly a distinguished eng (E9) at Meta.
I asked him for all the details on his career story and how he uses Codex for max benefit. Timestamps:
00:00:00 - Intro
00:00:56 - Chickenfoot
00:02:45 - Working at Google
00:06:34 - Overhauling Facebook's build system
00:16:36 - Rewriting Facebook's IDE
00:26:01 - Struggles after Principal Eng (E8) promo
00:28:39 - Building a virtual filesystem for Facebook
00:35:47 - Delayed Distinguished promo (E9) and learnings
00:39:56 - Joining OpenAI
00:43:05 - Research-led vs engineering-led cultures
00:44:53 - The story behind Codex
00:51:00 - How he uses Codex
00:57:00 - Why Codex's harness is open source
00:59:50 - Top technical book recommendations
01:05:02 - Why deep technical skills are still valuable (for now)
01:11:07 - How to start projects well
01:14:27 - Advice on writing better and career planning
01:17:06 - Advice for his younger self
01:19:10 - Outro
He was a dream guest of mine and I'm excited to share his story with you all!
Other places to watch:
• YouTube: https://t.co/uz8snLYoAX
• Spotify: https://t.co/kyDNpaX6LL
• Apple Podcasts: https://t.co/jOYDGtHtd1
• Transcript: https://t.co/uJF1oJHfnd
ryan and team worked extremely hard to make gpt-5.4 great for finance
it's much improved for financial modeling and analysis, integrates directly into excel, and connects to factiva, daloopa, s&p global, and many more
it does feel like a codex moment is coming here
BREAKING:
@OpenAI just released GPT-5.4 and it is AMAZING.
We spent a week @every putting it through real engineering tasks from code reviews to planning workflows and using it inside of our @openclaw setups.
The verdict: OpenAI is back in the coding race.
- Its planning capability consistently beat Codex 5.3 and Opus 4.6 in head-to-head tests. It produces plans that are thorough and technically precise, and have a user focus and “human” feel that has been missing from OpenAI's previous coding mode
- It reviews code with more depth than 5.3 Codex, and a much more conversational voice that doesn't make you feel dumb.
- It became our go-to model in @OpenClaw: with some model-specific tweaks to the harness it's fast, intelligent, and more human. It's also about half the price of Opus 4.6.
As ever, there are tradeoffs:
- GPT-5.4 has a tendency to expand the task well beyond what you asked for and to call tasks done before they're finished.
- In the @OpenClaw harness it sometimes completed tasks in obviously wrong ways, then lied about it.
Overall though, it's my new daily driver for coding and in my Claw. Its thinking-traces produced some genuine wow moments for me.
Our complete vibe check is available on @every now ->
https://t.co/xiaXIYdd42
GPT-5.4 is out.
It brings significant improvements on coding, general reasoning, integration with tools and native computer use, and overall is really good for professional work.
I’ve been trying to simulate using Codex for the next year and what will change about my perspectives on software engineering as I transition from being a computer programmer to a harness engineer. There are so many, but here are a couple that have stuck with me:
Software dependencies - Large open source systems like Linux and MySQL seem like they will remain just as important, but I wonder if I will start to have different perspectives on smaller software libraries when the functionality can be relatively easily produced and tested with AI. Given the past decade of supply chain vulnerabilities and maintenance issues in open source libraries, will it become a best practice to reduce dependencies and write our own where possible?
Documentation - When I built a product before, the “specification” was split between docs, Slack, Figma, and Linear — but the vast majority of behavior was specified in code, i.e., the long tail of functionality is an emergent property of the code I write. The conundrum with agent-produced code is that it’s not clear which parts of the code were prompted (i.e., specified) and which parts were “vibed” (i.e., unspecified). That seems problematic when continuously evolving a large system over time because the harness will “forget” past instructions. I don’t think replaying prompts is correct either because in a single Codex session, a good chunk of interactions are interactive and effectively transient. I have an intuition that documentation will be as important of an output of my Codex sessions as code, documenting the substantive product decisions made during my session. Those docs clearly need to be directly in the repo, versioned with the code and available as context for future sessions. The docs / context discussion in OpenAI’s recent post on harness engineering resonated with me and maps to my intuition: https://t.co/K2sP6x92qN
OpenAI’s hottest app isn’t ChatGPT—it’s Codex.
In the last few weeks alone, the Codex team shipped a desktop app, GPT-5.3 Codex (a new flagship model), and Spark, the fastest coding model I’ve ever used. Usage has grown fivefold since January and over a million people now use Codex weekly. Codex was also the app that OpenAI chose to run an ad for in the Super Bowl.
I talked to Thibault (@thsottiaux), head of Codex, and Andrew (@ajambrosino), a member of technical staff who built the Codex app, for @every’s AI & I about what OpenAI is building and how they’re using it internally. We get into:
- Why they built a GUI instead of a terminal. Terminals work for quick tasks, they say, but feel limiting when you’re running multiple agents in parallel. The IDE, meanwhile, overwhelms users—and the Codex team wants the AI to dynamically decide which tools to show you for a given task.
- How they’re teaching the model to read between the lines. Codex is great at following instructions, but optimize too hard in that direction, and it starts taking you literally—like copying a typo directly into the code. The team obsesses over this tradeoff, and is also introducing “personalities,” modes users can toggle between that control how blunt or supportive the model feels.
- How OpenAI uses its own coding agent. Codex lets you schedule prompts to run on a recurring basis, and the team has dozens of automations running at all times. For example, one scans for merge conflicts every couple of hours so code is always ready to ship, and another picks a random file from the codebase multiple times a day and hunts for bugs no one would've gone looking for.
- Why speed is a dimension of intelligence. OpenAI’s newest model (Spark) is so fast that they actually slow it down so you can read the output. They see the speed enabling three things: staying super in the flow, replacing brittle developer tools with intelligent ones that can adapt on the fly, and redirecting the model mid-task— especially with voice—so coding starts to feel more and more like a conversation.
- Code review is the next bottleneck. Models can generate code faster than ever, but someone still has to verify that it works. The team is exploring a future where the model proves its own fix works—retracing the click path a user would take, screenshotting the results, and attaching the evidence to a pull request.
This is a must-watch for anyone who uses AI coding agents—and is curious about the future of programming.
Watch below!
Timestamps:
Introduction: 00:01:27
OpenAI’s evolving bet on its coding agent: 00:05:27
The choice to invest in a GUI (over a terminal): 00:09:42
The AI workflows that the Codex team relies on to ship: 00:20:38
Teaching Codex how to read between the lines: 00:26:45
Building affordances for a lightening fast model: 00:28:45
Why speed is a dimension of intelligence: 00:33:15
Code review is the next bottleneck for coding agents: 00:36:30
How the Codex team positions against the competition: 00:41:24