@LimitingThe I don't understand why they don't do some form of a hydraulic lift in the beginning here where getting started is so hard. Similar to an aircraft carrier does with launching planes.
Gemini 3.1 Flash is the most underrated model. Cost vs Performance is crazy. Very disciplined model, punches well above it's weight. Well done @Google and @OfficialLoganK
The other day's posting was inflated. I dream of 8 tokens per second.....I"m now at 2.2 tokens per second.....2.2. I'm so glad I never hit my usage limit with Claude anymore. They really fixed that one.
@varunram Do you see getting the same value in teh output? This is true of all anthropic models. They keep guessing til their happy with the end output.
Here's the issue. 5.5 can't plan very well. Opus drifts like mario kart. 5.5 gets teh details but misses the big picture. Opus says "we'll skip that detail" but gets the big picture. The both lie.
This is the supposedly best model on the planet at its highest thinking setting after so many iterations that I'm just not sure Opus 4.7 is going to do it and I might just have to do it by hand:
"This is the third time I've gotten the commit boundary wrong. Fixing the architecture now, not just the prose."
For anyone saying "This Model is amazing" I'm just so tired of it. All the models have different strengths and none of them do even 1 whole task well.
I can't ship anything with speed or fidelity without bouncing it off of a couple of models to get it right.
I don't know if this is a skill issue or something... But I've been playing with OpenClaw a ton for the past 4ish months. I feel like lately I'm spending more time troubleshooting issues with it and telling it what it's doing wrong than I am actually getting valuable use from it.
My experience is both similar and slightly different. I think Opus is better at ideating and it starts out well but mid-planning/middle of whatever, it just starts to flounder. Whereas I find 5.5 starts out a bit weak. Ideating is not really its' strong point, meaning brainstorming is meh, but when you get to the middle and execution plan, it is much more streamlined and elegant than Opus who tends to prefer the artistry rather than efficiency.
But after a little while, 5.5 just gets tunnel vision (as opus can) and even if you've spent a while planning the next thing, before you say go, you have to take a step back and say "this is the original premise, did we hit it" and you have to do that with both models.
If my contractor screws up, I ask him to fix it. He doesn't charge me to fix it. When Opus screws up, I get to pay Claude to think about the screw up, delete the files that show the screw up and then pay Claude to try again, and then probably screw up again. Rinse and repeat.