Big-time AI nerd. Founder of Expert Studio AI: we build automations and AI tools that save your team time (no hype, actual results, security/safety first).
@johncrickett Actually as I've found, talking to AI agents is not a simple skill.
One of the main reasons AI adoption is so low in enterprise is that people don't know to use them, get bad results, and give up.
Getting the most out of frontier AI is a skill of its own.
So i built it specifically for learning MY way of doing a certain type of work.
Could I have claude/codex "learn" by prompting it to make memories/fill out a wiki?
Yes, but Hermes is more automatic. I quite enjoy how fast it learns on the job.
Thinking of taking the matured skill files and dumping them into claude/codex later on when the learning's been done.
@Akasheth_ No way Jose. It's great for easy tasks, but any kind of serious work it misses the plot.
If I had the energy to wire it up as a coder agent directed by gpt 5.5 or opus - that could work well I think.
I left Windows in the Vista days for Ubuntu. Windows was slow, buggy, just an awful user experience.
Eventually switched over to a Mac because Ubuntu was very limiting in terms of professional software at the time.
Switched back to windows about 4 months ago for the same reason I originally left Windows. And just realized today... I haven't had any issues since switching. Any hiccup is easily fixable.
Compare this to macOS where tons of major issues (hello bluetooth) are "yeah we've known this is an issue for the last 10 years but still haven't fixed it".
Apple has awesome hardware but they've lost the plot on the OS.
@Govindtwtt That's because people actually building are too busy building to post. I have custom tools and agents clients are using now. Been meaning to post case studies for ages. No time - I'd rather build.
Yeah I honestly don't feel a difference compared to 4.7.
4.7 hallucinated and forgot key instructions, 4.8 does the same.
I currently run all important outputs from Opus through codex and it always finds a ton of edge cases, inconsistencies, missed edits, etc.
No difference in that component between 4.8 and 4.7. Both are smart, neither are reliable enough on their own.
One thing I found is that 4.8 is worse at long form creative writing than 4.7, mostly because of forgetting important narrative details and a rigid literal reading of the prompt.
Which is funny because I get that you tried to make it stick to the prompt more but I really don't see that happening in actual coding/research/documentation tasks.
@datalevi I doubt it and I'm not sure what the point would be. It's likely more to do with the harness, which includes the system prompt. Anthropic is proud of Claude code being 100% ai-coded, and they keep making changes.
Proof: Anthropic models do better in other harnesses (eg Cursor)
Opus 4.8 is, again, meh. Several dumb mistakes in a 5 hour session that would've really cost me if I didn't catch them. Still can't trust it.
Agents sound awesome except it tried to launch 47 concurrent agents, launched 25 instead and had the 25th cover batches 25-47.
ุนูุฏ ู ุจุงุฑู ุณุนูุฏ!
Chatgpt seems to be unable to create nice calligraphy AND proper tashkeel.
So.... this is Claude.
P.S. Photo is mine, chatgpt helped remove people, Claude added text.
It's not correcting. It's planning an answer and lining up tokens in an optimal way for the response.
People do this all the time: "this is wrong for 2 reasons"... mention the reasons, "oh actually there's a third"
Thinking allows the model to list out the reasons first, then it knows exactly how many it'll be listing in the actual answer.
Just like a human who plans out an answer before speaking.
@ian_dot_so I've switched to Codex for a lot (not all) of my coding but I couldn't recommend it for non-technical to teams.
Claude is still the best experience by far for teams/enterprise.
@Layton_Gott Codex is very good at this overall. Not perfect but running a plan and the written code between Claude and Codex gets rid of most of these issues.