Spent almost a week with Opus 4.8 and it looks like a small change, but it's bigger than you think. Spent hours with a problem Codex couldn't solve because it was approaching it as an engineer, not a systems analyst. That's the difference and it won't show up in any benchmark.
Check this video out, I go through how I used it to upgrade my new project https://t.co/N9UcazSkjk to refactor a freemium model, new features and more over 4 days and 48 commits. Took a problem couldn't figure out and immediately solved it. See how I use it for user testing, cowork and lots more. It's a jam packed 5 minutes.
You can throw away the benchmarks, and it's not even their best model (Come at me Mythos). Check out the review.
https://t.co/c8biGLOFDl
You don’t need a better prompt.
You need a contract.
When people vibe code, they usually know what they want generally.
But AI tools need specifics:
- what to build
- what not to build
- how it should behave
- what design rules to follow
- how to stay aligned when the build gets messy
That’s why I’ve been moving from “prompting” to “contracting.”
A good AI build contract gives the tool durable context it can keep checking against.
For MVPunk, I use 4 files:
1. PRD.md
What are we building and why?
2. AGENTS.md
How should the AI behave while working?
3. CLAUDE.md
How should Claude/Cursor/Codex orient inside the project?
4. DESIGN.md
What should the experience look and feel like?
It isn’t about bundling paperwork. It's reducing drift.
AI tools are incredibly capable, but they will happily build a feature-heavy mess if you don’t give them boundaries.
Prompts start the conversation. Contracts guide the work.
Some of the latest models are pretty good. I've been using Mimo 2.5 Pro and it was great enough to run Otis (my bot) for three weeks without errors.
I recently moved to ChatGPT's 5.5 because their $20/month is subsidized and started to have to use Claude Code instead of cursor for the same reason.
Cursor's new model (Composer 2.5) is shockingly good. I was pretty surprised. Not quite better than O4.7, but at least as good as 4.5 and rapidly getting smarter. Look for this to be the coding model to beat now that they have the deal with XAI for compute.
Qwen is supposed to be a good designer. Probably my next AI Model Trends target after Gemini Flash 3.5 releases. Maybe we need a course on Token-maxing. Or the opposite thereof.
The problem is local hosting isn't the same as cloud hosting. The infrastructure is completely different and people don't have the H100s to provide a similar experience. If they try to host, they'll find they need to spend all types of money and in the long run, they'll just go back to cloud hosting.
The Chinese models are so cheap, that it's just better to use them instead of Claude. But better isn't best and the Claude experience is much more than jus the model. Connectors, skills, plugins, memory, MCP support. Those are all things that have to be added to make a Claude. The model is a small part of the harness that makes a great experience possible.
Composer 2.5 is an excellent model, unfortunately I don't think the new flash was really meant for coding though. It might be useful for other things. I gave it a design task and it just quit...at least it did it quickly.
I really liked Claude Design, but it does have issues since it burns so many tokens. That should improve (same with memory prices) over time, but it will take a bit. Meanwhile I made a nice collection with all kids of design resources including some of my own vibe coded projects like Vibe Glossary, Claude Design competitors like Stitch and Open Design and tons of inspiration sites.
https://t.co/dttfwOhs1A
I had been running GLM-5 for a while and it was decent, but still made some errors I wasn't pleased with. Mainly misunderstandings managing my Content Pipeline Kanban Board.
Been on Mimo 2.5 Pro for almost half a month now and I gotta say, the problems went away. I was expecting more savings, but as you can see, there was virtually no difference.
As a teacher, I have to try different models all the time, but it's working so good, I really don't want to. I'll give Grok and Kimi 2.6 (I was on 2.5 before which I remember being pretty good). I would really love to run Gemma 4 locally for free...crossing fingers that my machine can handle it).
Will report l8r
You know...I really liked @Comet and had been recommending it for years, but I'm out. I don't know why anyone would think of removing slash commands in their assistant and making any skills that I create virtually unusable.
When some idiot thinks that removing the most useful feature they've ever had for no reason whatsoever, it means the company is not thinking straight. I have to wait for a while since I made the mistake of paying for a year subscription, but I'm uninstalling it and finding a different solution.
There was a time when this was the best option, but now the Claude Extension is better, I had even started using that Claude Extension in Comet since it had gotten so bad.
I'm out and uninstalling this disgrace.
I gotta say @comet, removing slash commands from the sidebar assistant is just dumb. Easily my most used feature, now totally gone...and for what? Now I have to find a browser that doesn't do ridiculous things like that.
I've been running tests all night between the new GPT Image 2 and Nano Banana Pro and I'm sort of undecided. The one with the wilder tittle font are Image 2. Google's look a little more corporate and less 'fun', but the fidelity is great. I do like the larger resolution of I2.
The interface is from my own website/open source project called BrandoIt. Besides adding the new models, I added a comparison slider, etc. It's probably the most used thing I've ever built. Go check it out or give it a star on GitHub or clone it or whatever.
Sorry, you're going to need to provide your own keys until Google buys me out or I hit the Lotto or something.
Website: https://t.co/Tg5JvgGHEj
Repo: https://t.co/YoVipReIeN
Work in progress. One thing I didn't realize is how much language designers/developers have been using that sounds alien to new users. Sheets, Drawers, Switch, Toast, Dropzone, Masonry.
You can also copy the prompt/code so that you can send notes to your vibe coding platform.
My beginner students in my Stanford Vibe Coding class were having some trouble learning some of the terminology for things they needed to build, so I created this Vibe Glossary, which has now expanded with learning paths, scaffolding code, progress, quiz mode, etc.
I gotta take a break until my tokens renew or go use Cursor for a while. Claude Code for Desktop is a Blast.
https://t.co/hGunCkLi2K
https://t.co/f9Hgw30Vwr
Stars always welcome, MIT licensed open-source. I've got 44 items and once I get more tokens, I'll add some more. It's actually a lot of fun.