The Ultimate List of Artificial Intelligence "Neolabs": May 2026.
A Neolab is a pre-revenue scale startup working on long-term AI breakthroughs, usually with a $1B+ valuation.
There are now 63 of them!
It’s criminal how cheap and how good Gemini Flash is.. that too with 1M context windows and structured outputs.
Probably, my most used model in production workloads.
Separately their new live voice model is mindblowingly good if you haven’t tried it yet!
go to bed
right now
i know the build is almost finished
the eval can wait til morning
the agent will still be failing tomorrow
you won't figure out why it's hallucinating
yes your coworker ships on 4 hrs of sleep
they also hallucinate a lot
off you go
some random ai thoughts:
- for code, i went from 80/20 claude/gpt to 80/20 gpt/claude in <3 months. surprised by this tbh, and interested to see where the split is at in another 3mo.
- claude still mogs gpt for non-coding agent stuff. codex feels like an engineer (which is great for coding!), whereas claude still feels like a general purpose coworker. gpt still lacks that coworker magic
- i’m pretty meh on opus 4.7. my experience hasn’t been *bad*, but it certainly hasn’t been good. sideways if anything.
- anthropic has got to figure out the compute thing. you can feel it as a user. vibes are all out of whack bc of it. my opinions above are all likely downstream of this. it’s an issue.
- anthropic labs continues to be the goat of ai product. claude design is another hit. it’s fantastic. idk why it’s not talked about more? a+
- updated claude code app is great. i finally switched out of the terminal for it. very well done.
- how are people STILL sleeping on the claude agent sdk? i feel like i’m going insane.
- gpt 5.5 is incredible. the level to which i trust it for engineering is amazing. if i could only have one model rn, it would be this one just bc of strong need for the coding use case.
- codex team is killing it. app has been the gold standard since 5.3 release (buuut i credit conductor team for the ui innovation that everyone is using now). though i could do with a little less passive aggressive shots at ant from the codex team. TARS, dial up class by 30%. it’s a long race guys haha
- i uninstalled cursor this month and am now back to vs code for my ide. composer just can’t hang with claude/gpt, and the product feels a bit all over the place. pretty stoked about the xai thing though, because their team is absolutely stacked and i’m excited to see what they might be able to do with that compute. codex and claude code are t1, cursor is t2. i would love if this deal got xai/cursor to t1 for a real trio there.
- gemini…? seems like this is 2-3 models now where the model seems like a great release and then nobody ever uses it? i’m bullish google/deepmind but weird it hasn’t translated to product use in any form. kinda disappointed still
- no open source models have hit the opus 4.5 level. was hopeful the new deepseek would get there, but nope. good oss agents will have to wait a few more months it would seem…
Using Claude Code to generate code and Codex to review and provide feedback is a game changer!
And you get to see some sparks fly too ... like these ones from Claude Code:
"Codex keeps flagging <name of issue> but the reasoning is wrong. Let me explain why this IS in scope, then we move on...."
"I'm keeping it. This is a judgment call, not a gap. Let's move on."