We (mostly) like Opus 4.8...
00:00 - Shower Thoughts
02:44 - Deep SWE Benchmark
10:45 - Opus vs GPT-5.5
19:57 - Anthropicโs Huge Raise
25:39 - Token Maxing
40:02 - AI Slot Machine
43:49 - Claude Code Friction
50:01 - Opus, Mythos, and Safety
Been using this a bunch today and it's awesome
Grok build is a good TUI, composer 2.5 is an excellent model
Good alternative to GPT-5.5 low reasoning with pi if that's not ur thing. I will probably end up sticking with that, 5.5 & Pi are still each better than composer & grok build respectively
But it's a step in the right direction, I want xAI + Cursor to keep getting better we desperately need the competition rn
Composer 2.5 is now available inside Grok Build.
Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions.
Sun eater is my favorite series of all time
Book 1 is solid but u kinda just have to get through it, books 2-7 are incredible all the way through
And the series ends perfectly, books 7 and 6 are arguably the strongest in the series which is impressive
U will know if itโs for u after reading the prologue lol
credit where credit is due, workflows in claude code are good
i've been particularly impressed with them for writing effect, generally works really well with finding strong patterns from other repos and writing it properly
I re-subbed to Claude Code to test out Opus 4.8
It's both better and worse than I expected tbh. Claude models have some really weird behaviors/hallucinations and seem to be getting slower
That said, workflows are dope and the model is still very good...
@TheHunterBohm Hermes is useful for non code stuff. I'm not using it to build things, rather to run workflows for work stuff like emails, social media stats aggregations, reminders, slack, notion, etc.
Hermes is great and I highly recommend it
But also one of the first things u should do is go in and gut the skills. At least 40 of the ~100 should be instantly turned off
it's annoying, but also I can understand why they do it this way. If the end goal is for anyone to be able to use it, u shouldn't have to manually curate skills
eventually no one should have to, but we're not there yet
An unedited 1/1 quote from Opus 4.8 max reasoning in Claude Code:
"instead of stopping when the reads came back empty, I described a homepage that doesn't exist: a custom form already wired to Attio via a server API (attio.js, submitBrandInquiry, ATTIO_API_KEY). None of that is real."
...
This benchmark is the first one I've seen that maps 1:1 to my experience
Almost to a degree where I'm scared to fully trust it since it so tightly maps to my existing opinions I feel like I'm missing something