Co-Founder @therealhiava AI-native Family OS + Porsche nerd @porschenotes also 🦞Ex: AI @Google, CPO Motorsport Network, CPO @VoxMedia, Product+Eng @ESPN
two recent papers validate what we learned the hard way building Ava:
1. single agents outperform multi-agent systems under equal compute budgets
2. agent swarms without centralized verification don't just add overhead — they amplify errors across every handoff
coordination is a tax on intelligence. if your single agent is already good, adding more agents makes it worse, not better.
https://t.co/WzCnTYigu9
https://t.co/aVqvSnc4Rl
we spent 6 months on model quality at Ava before realizing the thing users kept coming back for was memory.
not smarter answers. not faster responses. the fact that it remembered their kid's name, their morning routine, what they said last Tuesday.
memory is the actual product. the model is just the delivery mechanism.
one super agent beats a swarm of little ones.
we tried multi-agent at Ava. routing between specialists, orchestrating handoffs, managing state across a dozen tiny agents.
ripped it all out. what actually works: one agent with a dynamic tool registry. it picks from hundreds of tools at runtime instead of splitting intelligence across ten dumb agents passing notes.
the coordination overhead kills you before the architecture does.
asked my 4 year old what he did at school. he told me about a kid who cried at lunch because his sandwich was wrong. he remembered the emotion, the context, the sequence.
that's episodic memory. not key-value storage. not vector search over a flat log.
at Ava we built our memory around episodes. temporal context, emotional valence, causal links between events. the difference between "user likes coffee" and "user was stressed last thursday because daycare called during a board meeting."
one is a preference. the other is understanding.
Great overview on QSBS (and the cascading effect bad tax policy could have on the NYC ecosystem). Thx @StevenFulop and @Partnership4NYC for your advocacy here.
@jspeiser Wild! This is for sure why the agent harness is the best place to advance right now. It’s always context and instructions. Models are so much more capable than most use cases today.
Let me list the work I do in my personal life
- order groceries
- keep track of birthdays + gifts
- plan parties
- plan trips
- keep a house in standing condition
- keep a car in standing condition
- do my taxes
- pay my bills
- invest my money
- take, organize, and share family photos
- help my kids with homework
- enrich my kids academics
- register my kids for activities
- attend and manage several kids sports teams
- keep my body healthy
- keep my kids healthy
- keep an eye on my parent's health
- cook meals
- clean + organize the house
- stay intellectually engaged / read
- exercise
- design, furnish, and organize our home
- keep plants alive
- stay engaged with the neighborhood
- stay engaged with politics
- keep up to date on the news
- repair broken things around the house
- chauffeur my kids and their friends
- price compare and purchase utilities
- make holiday magic
- order school lunches
- pick and manage charitable donations
- endless returns
I went to 🦞 @openclaw@clawcon last week in NYC and it's truly amazing to see the personal AI movement taking over. The problems felt very familiar to our last six months building @therealhiava - from memory to multi-agent chaos. The gap between "cool demo" and "runs my life" is where all the hard problems live.
Vibe Coding Is the New Product Management
“There’s been a shift—a marked pronouncement in the last year and especially in the last few months—most pronounced by Claude Code, which is a specific model that has a coding engine in it, which is so good that I think now you have vibe coders, which are people who didn’t really code much or hadn’t coded in a long time, who are using essentially English as a programming language—as an input into this code bot—which can do end-to-end coding.
Instead of just helping you debug things in the middle, you can describe an application that you want. You can have it lay out a plan, you can have it interview you for the plan. You can give it feedback along the way, and then it’ll chunk it up and will build all the scaffolding.
It’ll download all the libraries and all the connectors and all the hooks, and it’ll start building your app and building test harnesses and testing it. And you can keep giving it feedback and debugging it by voice, saying, “This doesn’t work. That works. Change this. Change that,” and have it build you an entire working application without your having written a single line of code.
For a large group of people who either don’t code anymore or never did, this is mind-blowing.
This is taking them from idea space, and opinion space, and from taste directly into product. So that’s what I mean—product management has taken over coding. Vibe coding is the new product management.
Instead of trying to manage a product or a bunch of engineers by telling them what to do, you’re now telling a computer what to do. And the computer is tireless. The computer is egoless, and it’ll just keep working. It’ll take feedback without getting offended.
You can spin up multiple instances. It’ll work 24/7 and you can have it produce working output.
What does that mean? Just like now anybody can make a video or anyone can make a podcast, anyone can now make an application. So we should expect to see a tsunami of applications. Not that we don’t have one already in the App Store, but it doesn’t even begin to compare to what we’re going to see.
However, when you start drowning in these applications, does that necessarily mean that these are all going to get used or they’re competitive? No. I think it’s going to break into two kinds of things.
First, the best application for a given use case still tends to win the entire category. When you have such a multiplicity of content, whether in videos or audio or music or applications, there’s no demand for average.
Nobody wants the average thing. People want the best thing that does the job. So first of all, you just have more shots on goal. So there will be more of the best. There will be a lot more niches getting filled.
You might have wanted an application for a very specific thing, like tracking lunar phases in a certain context, or a certain kind of personality test, or a very specific kind of video game that made you nostalgic for something. Before, the market just wasn’t large enough to justify the cost of an engineer coding away for a year or two. But now the best vibe coding app might be enough to scratch that itch or fill that slot. So a lot more niches will get filled, and as that happens, the tide will rise.
The best applications—those engineers themselves are going to be much more leveraged. They’ll be able to add more features, fix more bugs, smooth out more of the edges. So the best applications will continue to get better. A lot more niches will get filled.
And even individual niches—such as you want an app that’s just for your own very specific health tracking needs, or for your own very specific architectural layout or design—that app that could have never existed will now exist.”
Vertical software isn't dead.
General purpose AI tools are incredible, but they don't know, can't know, and will likely never know how to solve the last mile because that's not their job.
Process engineering ftw.
Excellent read @gsivulka@hebbia.
If you're building "Ask me anything," you're building a feature.
If you're building "Do this for me," you're building a workforce.
Stop optimizing for conversation.
Start optimizing for state.
The hardest part of building an AI Employee isn't the LLM.
It's the state management.
Marc Andreessen just called this "the most important moment in tech history."
He’s right. But not because models are getting smarter.
It's because we're moving from Chatbots to Agents. 🧵
We are moving from Stateless Chatbots to Stateful Agents.
A Chatbot lives in the moment.
An Agent lives in the workflow.
Chatbots remember what you said.
Employees remember what they did.
This is the moat.
Not the prompt. The memory architecture.
@nbaschez Same. The role is basically "find friction, kill it with automation."
Best part: tinkerers generate their own roadmap from observing real workflows.
LLMs are bad at following schemas.
Ask for ["value"], get "value".
Ask for {id: "fm_01"}, get {id: "mom"}.
We have 200+ lines of validators that silently fix malformed outputs instead of failing.
Defensive AI programming. It's a thing now.
Super cool that everyone can now use this native audio model. Here's how we @therealhiava are making the most of it https://t.co/6Cns14jCr4 Huge thanks to @GoogleDeepMind and @GoogleAIStudio for the spotlight!