Hey everyone!
As you can tell from my posts, I’ve been diving deep into large language models (LLMs) lately, and I’d love to connect with anyone who’s curious about their potential.
Whether you want to brainstorm the best ways to implement them in your business or simply have a few questions, let’s schedule a call and chat it through.
Ready to explore AI / LLMs with me? Drop a comment or shoot me a message, and let’s set up a time to talk! Or use this link to book a 30 minute discussion:
https://t.co/gdb0TzRn7L
The biggest unlock was making the codebase and workflow agent-legible. For me, the biggest difference came from:
1. Clear task boundaries - smaller slices, owned files, explicit acceptance criteria, and “done means…” checks.
2. Durable artifacts - agents should not rely on chat or terminal scrollback. Plans, prompts, logs, summaries, evidence, and review outputs need to be written to files.
3. Fast verification - a single reliable command like 'make check' or 'make coding-check' matters more than a perfect test suite. Agents need a clear green/red loop.
4. Good routing - not every agent should do every task. I have Codex own planning, integration, and verification; faster agents can do first-pass implementation; stronger reviewers come in for complex/ risky work.
5. Observable failures - the biggest improvement was classifying failures precisely: timeout, missing artifact, idle worker, bad prompt delivery, failed checks, dirty worktree, etc. “Agent got stuck” is not actionable.
Smaller modules and tests matter, but make the repo easy for an agent to orient, act, verify, and leave evidence.
That changed the success rate more for me.
Codex team can the app switch from plan to goal automatically. Right now if you start in plan mode and then put in a prompt for a goal which build on the plan it stays in plan mode and does nothing. At least ask when in goal mode if it is executing / writing or still in plan mode.
Hi. Over the last 24 hours we had three separate small incidents that affected Codex reliability. Those are three too many and we are taking active steps for them to not reproduce.
I have reset usage limits for Codex across all paid plans. May the tokens flow again.
The Codex usage limits have been reset for all paid ChatGPT subscriptions. You should be back to 100% weekly and 100% hourly limits.
Let the tokens do incredible things today and have fun.
Excited to share our most powerful new Claude Code feature: dynamic workflows!
Mention "workflow" in a prompt and Claude will dynamically create an orchestration plan that it strictly follows, allowing you to confidently trust that every stage happens in the right order even across 100s of agents.
One odd thing about AI work:
LLMs are trained on human-created data and generate probabilistic answers.
Humans double-check our own work all the time. But a lot of people ask an LLM or agent for an answer and stop there.
If you would check your own spreadsheet twice, why wouldn’t you ask the model to review its work too?
One odd thing about AI work:
LLMs are trained on human-created data and generate probabilistic answers.
Humans double-check our own work all the time. But a lot of people ask an LLM or agent for an answer and stop there.
If you would check your own spreadsheet twice, why wouldn’t you ask the model to review its work too?
Given the competition of OpenAI, Anthropic, xAI/SpaceX (Cursor), Google and open source, is the moat?
-be first
-context/memory
-be fastest
-be smarter
-be cheatper
How would you sort it?
Codex anywhere and everywhere, all the time.
Now your Mac doesn’t have to be unlocked for Codex to use your computer.
From your phone, Codex can securely use apps on your Mac, even when the screen is off and locked.
https://t.co/PCGK4i7FSF
It’s Codex Thursday, and yes, we have updates for you.
First up: Appshots, a new way to bring the context of what you’re working on into Codex.
On your Mac, press Command-Command to attach your app window to a Codex thread. Codex gets both a screenshot and text from the window, including content beyond what’s visible onscreen.
Appshots are available across plans on Mac, with enterprise access coming soon.