@Suhail And it’s not just that horizons are growing - if Anthropic’s take-off blog is correct then model development times are shrinking too
this could be sooner than we think
@robertcourson OpenAI tends to hold things back until someone advances SOTA.
Will be interesting to see if they drop something big in the next week or so (and if not, they’re legit behind)
Even though it's the most important product in the world right now, general purpose agent harnesses are still up for grabs.
I've tried all of them and none of them give me everything I need.
So far it's basically been an either/or:
- The frontier models are finally transitioning from the terminal to apps, but they're still architected for smaller, local tasks.
- The harnesses have a great Telegram-style experience, but harder to use on a laptop
- Even though we know the really valuable stuff for an agent now (self-learning, compounding knowledge, loops), you have to work really hard to get them out of the box.
None of them feel really ubiquitous, and they all feel like they're just scratching the surface of what a smart model can do.
I think the one to beat right now is Codex with an always-on server. But that's not going to fly for the non-nerd public.
And the product that satisfies the full shape has to live outside Claude/Codex by definition. Until the model wars settle down, we need interoperability and the one thing Codex won't do is work with a different model.
This is going to get figured out in the next 6 months. Big prize and still anyone's game.
@gregisenberg "This agglomeration which calls itself the Holy Roman Empire was neither holy, nor Roman, nor an empire"
my favorite piece on this @tanayj
https://t.co/mXkW7b9UgC
@michalmalewicz Apple’s path is definitely the riskier of the two though.
For OpenAI’s bet to work, LLMs need to be really useful.
For Apple’s bet, they need to be both useful and small/cheap.
@clairevo In non-coding domains, IMO the hardest part is defining a clear success condition
those can be highly subjective (and often need weeks of wall clock time before they can even be met)
@zachtratar@hnshah What does the cron look like? just -exec “check for notion jobs”?
openclaw heartbeat is very good but feels like it’s harder to replicate in codex/CC (or even hermes)