Latest model releases have been a real let down. Haven’t felt any real step change improvements. It feels like the labs are having a hard time wrestling the best of intelligence. Each release is not really better but just ‘different’ in some helpful but also counterproductive ways.
this has been my experience as well. there definitely were improvements, specifically wrt shell based computer use, but also regressions, especially in the last 3 version bumps of flicker and gerpertee corp.
the last step change was 3.x to 4.x in flicker land, probably mostly due to them getting all coding sessions from april to october 2025 via CC. similar timeline with GPT and Codex.
at least in my line of work, no big jumps after that. the benchmark increases mean literally nothing in the real world.
i suppose we have a data problem now. only so much you can RL into those damn things. and with ralph loops/swarms/agents reviewing agents/whatever, you get less and less human signal to improve RL, would be my uneducated guess.
also very hard to capture design/system thinking in RL would be my guess.
all that said: if we are at the top of the S curve now, then i'll take what we got. plenty useful, even if it won't replace me fully nor partially anytime soon.
Recently been feeling a need for a ‘Human Reviewed’ badge for infra providers and critical libs.
If I’m using your thing for critical infra. I want it to be extremely solid and safe from the temptation to slop.
This is peak 2026 - 180 retweets, 1.3k likes, 1.2 bookmarks, not a single issue on the repo.
So I looked at what's actually in here. I was not impressed.
Since none of you will go to github anymore, here's a screenshot of my questions.
https://t.co/CRYkkCArw5
@samuelcolvin@zeddotdev The other thing I would love (but probably is best in Zed itself) is a rendered markdown editor, aka Notion but for files on my computer.
The future of docs and sheets is .md and .csv