@databasedave Oh hey, I actually tried finding you on X but I must've had a really dumb moment because I couldn't. My bad. I would've tagged you. But hello!
Good read, I think Dave and I would get along just great. I’m also firmly in the camp of AI is great and we should all be using it but we need better measured outcome-oriented analysis of it. https://t.co/95wpGcAjU5
@sergeykarayev I believe it. But that doesn't mean you should run it against every bug. You should run the cheaper models first, which will probably find it, then fallback to it.
Fable is a good model. As with all new models, it is simultaneously excellent and entirely unremarkable (relative to other models). It is slow and expensive, and the "loops are all you need" discourse they are pushing is obvious in the context of someone using Fable-class models
What I've found so far is that for broad scope design (code architecture) tasks, Fable is unremarkable. Or, not better enough to justify its cost and speed.
But in highly targeted goal-oriented loops, it is another beast entirely. It is very slow but produces very good results.
I let it churn on optimizing a SwiftUI-layout resolver in Go I wrote and it was able to bring it down to an order of magnitude I could not reach myself (micro => nanosecond scale). But it took 2 hours and $40 to do it and I had to claw back some changes it overfit to Apple Silicon. Still, very worth it.
In comparison, for "implement this feature/change" iterative work, I ran head-to-head Fable vs GPT5.5 vs. GLM-5.1. They all produced equally acceptable final results, but GPT5/GLM did it in a couple minutes and Fable was churning away for 40 minutes. And GLM cost me less than a dollar, GPT5.5 ~$1.50, and Fable cost $9.
You can see that in this context, interactively working with an agent is nonsense. Its too slow. You need to write loops to keep the agent working and you probably want to highly parallelize the work being done. As with all things, I think a balance makes sense...
My sense is that I'd reserve Fable for targeted, surgical analysis and work. Not for daily driving everyday tasks.
I'm going to keep spending a shitload of money (relatively) and maining Fable for the rest of the week to continue to judge, will report if anything changes. I'll continue to head-to-head as well.
@mitsuhiko@antirez Yes. I think it’s really optimized for long horizon loop based agentic workloads. But it’s way too expensive for me to actually build tooling around that.
@shubham_arora_0@primorac18 Something new I'm building (not for the purpose of loops, just happens to be good for that). But yeah, lots of situational options.
@SatoshiGokumoto@primorac18 Thats a better way to phrase it. I think the agent orchestrating agents is itself a loop, and the agents its starting may or may not be themselves.
I haven’t had time to dive deeply into all the WWDC changes yet but I’m seeing positive feedback 9 times for every 10 posts. Superficially looks great. Polish on iOS and macOS looks great. Siri looks great. CoreAI and related looks great. Is Apple software so back?
@toolmantim I did, but it falls into the same problems I had with Opus imo. Small sample size, but it seemed over eager to code when it needed to plan a *bit* more.
In the increasingly growing discourse around companies wildly spending on tokens and blowing token budgets, I think its really just highlighting the changing modality of models (pun intended).
I feel like a lot of products and companies have moved beyond "use the latest model all the time" a while back. A lot of agents (non-coding) use things like Sonnet and Flash-based and open-weight models. Its good enough.
But for coding I feel like companies have mostly opted to use the latest at all times, for the most part. And I think that should shift with these Mythos-class models. I mean, I think it should've already been shifting with GPT 5.5 xhigh too.
The problem is that I think broadly speaking, a lot of people don't have the skill (a new skill!) to judge when to use what model.
But I think sandbagging, like 18 months from now we'll see a ton of habits change just due to cost (not even capability, but also that).
I should clarify: with vouch. Others are trying with other mechanisms. But I haven’t seen any take hold there either yet. I don’t think they will though. I think more realistically it’s a mix of platform filtering (eg account cooldowns and rate limiting) and just closed contribs in general (not targeted at individuals)
@KateHolterhoff@TheNoamLewis@ladybirdbrowser@steveruizok@tldraw@redmonk@monkchips Practically though I think the future is that large open source projects will close contributions completely. My projects aren’t doing this (yet?) but that seems to be the direction things are moving. Open contribution (at large scale) is over.