My 10 predictions for AI in 2026:
• Frontier labs will continue to shift focus shifts from models → products - models are now good enough. Products are not.
• Frontier labs will remain majorly compute-constrained, but progress continues — I expect new developments in memory systems, enhanced model capabilities, and potentially new advances in recursive learning: AI supervising its own training.
• Data centres - continued investment by all players into back end of 2030's and beyond. We are still early in the demand curve for tokens.
• New agentic workflows will dominate — coding agents first across a range of workflows, then enterprise knowledge work. By year-end, coding agents run reliably for 24+ hours.
• Frontier maths & science emerge as important new model capabilities— tasks that once required specialist ML teams become accessible to more people.
• The frontier gap widens between leaders and average users — most users still use AI as chat. Frontier users will start to orchestrate advanced agent teams and automated workflows.
• Job displacement becomes visible - roles blur, teams shrink, demand for 10x coders rises.
• Public sentiment turns negative - as job displacement occurs, politics will follow.
• Enterprise architecture gets rethought — if agents code reliably, does “buy rather than build” still make sense for enterprise? New AI native enterprise SaaS offerings will emerge to take on traditional SaaS.
• Anthropic IPOs H2 2026; OpenAI announces late 2026 for 2027.
i think OAI would be better releasing next week, assuming it's a material step up
Even with 5.5 performing better than 4.8 on Deep SWE, Claude branding is very strong - OAI need to open up a bigger gap to steal hearts and minds.
Future models will supposedly include new capabilities on multi-agent orchestration - this might be the swing factor.
I didn't cover Claude Opus 4.8 on my pod because I don't think it's MEANINGFULLY better than GPT 5.5 as of May 29th.
We're entering the era where model releases start to feel like iPhone releases. Remember when every new iPhone was a genuine leap? Now it's a slightly better camera and you can't really tell the difference. That's where models are heading. 4.6 to 4.7 to 4.8. Each one is a little different. Nobody can agree if it's better or worse. The benchmarks say one thing, the vibes say another.
The thing that actually matters right now is what's happening around the models. Claude Code shipped dynamic workflows this same week and that genuinely changes what one person can build.
Codex shipped a desktop app with an in app browser that combines coding and knowledge work in one surface. Those are the releases that move the needle for people. The model underneath is becoming interchangeable.
I think we're maybe 6 months from nobody caring which model they're using the way nobody cares which engine is in their Uber. You just want to get where you're going.
When something genuinely changes the game for builders, I'll cover it on @startupideaspod. Opus 4.8 wasn't that. Dynamic workflows was.
I'd rather save you the hour.
first update:
Claude Opus 4.8 – xhigh on march-may 110 tasks: 56.4%
gpt-5-xhigh: 62.7% – $2.25
gpt-5.5-medium: 58.9% – $0.98
Opus 4.8 - xhigh: 56.4% – $2.02
Opus 4.7 – high: 53.1% – $1.32
Opus 4.6 - high: 47.8% – $1.29
more open-weight models to come in ~1-2 weeks
I think they should leave it alone and make it easy to use / commands and shortcuts to summon thinking levels
Each model changes what you can do with the model, which then changes your proportional use of the settings
Instant / light are actually incredibly good for most basic things now, with 5.5, but there’s an ability to push pro and thinking-heavy in new ways
We're building a Moon Base!
@NASAMoonBase will serve as a habitat where astronauts live and work during long-term science missions.
Join us at 2pm ET on Tuesday, May 26, for a live news event where we’ll share updates on our lunar exploration plans: https://t.co/IJXA7xYwju
Incredible to be at a point where an AI system can create a breakthrough in an active field of mathematics
I have have mixed feelings on the implications.
One side, if this is verified and continues to happen, there are potentially incredible benefits on the table, as AI capabilities improve
On the flip side, does it create a moral hazard or knock on effect to a persons motivation and willingness to learn when there is AI more capable and runs 24/7 discovering new things.
It’s also important that the most capable systems stay accessible to all.
Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946.
For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids.
An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better.
This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.
Design taste (front end design) and copy writing - codex getting closer but still opus (particularly 4.6) are v good here - everything else 5.5
Opus still brings personality / role modelling in agents like openclaw better - gpt 5.5 a lot better but still a ways to go
5.5 my daily driver otherwise
Coding via mobile on ChatGPT / Codex is good, if a bit rough in research preview.
But running seperate Mac mini’s from the same mobile while on the go is next level.
Multiple threads each running sub agents on each box, all local ran, brings an incredible level of scale to just one person
User attention is really the new bottleneck
Tiny Codex iOS hack 🦺 for those who have multiple OpenAI accounts and want to control which account / credits are used across multiple machines.
I named my boxes after Anchorman characters and created skills that either
1) ssh into the machine or
2) ssh into the machine and uses the destination machines codex account via terminal.
“Ron Burgundy machine” = my current iOS Codex account SSHs into that box and burns its own credits.
“Ron Burgundy agent” = it uses the Codex login on that box, burning that account’s credits.
Machine vs agent = account routing.
So I can be in any chat on desktop or iOS and be in control of token burn
My laptop has become a “satellite device” since I started using Codex from my phone. And my Mac mini has become the “home.” It’s clunky, but the end state feels more like how we’re going to be working in the near future:
I’m currently running the Codex app on 2 devices:
1. my MacBook
2. my Mac mini
My laptop isn’t reliably connected to Wi-Fi enough, so I keep a Mac mini on my desk that is always connected.
When I kick off new threads from my phone, I start them on the Mac mini. When I’m working from my desk, I run them there too.
The cool part is that I’ve added my MacBook and Mac mini as connected devices to each other. That means I can start and resume threads from either device. So if I’m in a meeting but want to continue a thread on my laptop that was started on my Mac mini, I can do that.
I’ve also set up mutual SSH for Mac mini <> MacBook, so files are easy to access from either side. It’s not fully seamless yet, but the model works.
What this means:
- I have an always-on Codex that is accessible from my phone, with its own dev environment
- All threads are always accessible from any of the 3 devices
- I can run heartbeat threads that stay on 24/7
It’s a little makeshift today, but the shape of it feels very real to me: Codex is no longer tied to whichever computer happens to be open in front of me. It starts to feel like something I can stay connected to across whatever device I’m using.
Rate of learning growth for ASI would be a key metric. This could be constrained by compute investment, electricity, algorithmic strategies, and then geo-political / regulatory environments, to name a few; which would mean a later achiever of ASI could win if their rate of learning growth was faster over the long run.
I also still have doubts that ASI will be some singular homogenous product that necessarily converges to one end solution. There could be different algorithmic principles / product features that these could be built on, that could differentiate behaviour / outputs (particularly early on), that countries, companies, people might value differently. An example here is Anthropics pre-training constitution vs OpenAI's post training methods. Not everyone might want an ASI built with it's own sense of purpose / 'good'.
Stay on low for any basic context reading / summarisation/coding
High / xhigh for complex analysis, navigating conflicting info, pro for complex legal, deep research and planning, science and maths
5.5 instant is actually pretty incredible if you spend a day on it, once it’s verbosity is dialled in.
These models are v capable with no / little reasoning
chatgpt-5.5 instant is really incredible. It's doing a lot of things without reasoning that 5.4 needed with reasoning.
V much worth worth experimenting with - the speed + intelligence is really something.
Codex made me money without me doing anything..
Huge turning point for me today, I asked Codex to go off and make me $5. It went out, found a small open-source security/audit bounty path, made a legit PR, followed up with the maintainer, kept my payment details private - (without me asking), handled the GitHub proof/verification loop, and got the work merged.
it spent about 22 hours working on multiple security audits.
Today I received my first payment from that experiment: $16.88.
That’s a $506.40/month run-rate if repeated daily.
Not life-changing money yet, but it's deeply exciting to live out Sam Altman's vision for AI, where it will just go out and make money for you. It's awesome to start to see the beginning of that.