Every AI product team eventually runs into the same question:
What shape should this thing take?
Not the model. Not the benchmark. The product shape.
Usually, form follows function. But in AI, the function itself is still evolving — and the shape helps decide what the function becomes.
I mapped the 9 AI product shapes I keep seeing in this new essay: https://t.co/wK3IV5Gp5A
The key missing piece for this positive-sum future is AI agents still don't have the continual learning capability needed to sustain this human-AI learning loop.
that's what @NeoCognition is trying to solve: agents that can indefinitely learn to specialize for any profession, organization, and individual.
A ton of affordance concerns with Siri AI:
- What if I get out of the Siri flow at any point along the way?
- Where can I go to find the history?
- How easy it is to correct when there _is_ a speech-to-text error?
Without answers to these questions, how can I trust Siri to handle any of these multi-turn demo tasks 😂
#WWDC26 #SiriAI
reliable knowledge consolidation (or learning, really) requires a good representational structure in the first place, paired with a proper update mechanism, e.g., tree structures + decision tree algorithms, or geometric structures in neural nets + gradient descent.
simple rewriting by LLMs is by no means the right direction of learning
"Frontier labs are organised to serve one model to many customers. Specialisation requires the inverse, that is, many models built for segmented customers"
well said.
Yes. Human-in-the-loop is not a checkbox.
If the model can’t keep perceiving while it acts, the human is still stuck in the prompt → wait → interrupt loop.
Users often discover what they want through the collaboration itself. It’s a clever move to keep the outer layer present while deeper reasoning happens in the background.
Sharing our work on full-duplex multimodal models -- real-time interaction that's natural and intuitive without compromising on intelligence.
We started Thinky in part to differentially advance capabilities for human-AI collaboration, which are underemphasized relative to intelligence/autonomy because they're harder to eval.
In the future, we think every AI system will have something like an interaction model as the outer user-facing layer, continually keeping the user informed and learning what they actually want.
Now that I’ve joined NeoCognition, I’m starting to write again.
Planning to publish an essay each week on product design for AI agents, starting with this one.
Excited to help shape the next UI paradigm while the paint is still wet.
https://t.co/F9xNMhjy2h
Just as I’m thinking through adding GitHub without breaking Obsidian Sync for mobile, I saw this line on Mesa:
“You shouldn’t have to choose between version control and a filesystem.”
This is the file-over-AI direction I want.
Introducing Mesa: the most powerful filesystem ever built, designed specifically for enterprise AI agents.
Every team building agents eventually hits the same wall: where do the files live?
Not the chat history, the actual artifacts the agent works on.
> The contracts your agent redlined
> The claim files it updated
> The 200-page audit report it edited overnight while you were asleep
Today those documents live in a sandbox that dies in 30 minutes, an S3 bucket where concurrent writes clobber each other, or a GitHub repo that was never built to absorb agent-scale traffic.
So we built Mesa.
The world's first POSIX-compatible filesystem with built-in version control, designed from the ground up for agents. You mount it into your sandbox like any other filesystem. Your agent reads and writes files normally. Behind the scenes every change is versioned, branchable, reviewable, and rollback-able — like a codebase, for any file type.
Mesa provides
– Branches so agents work in parallel without locking
– Durable storage that survives sandbox death
– Sparse materialization so massive document sets load instantly
– Fine-grained access control per agent
– Full history for human review and audit
Design partners are running Mesa in production across legal, healthcare, GTM, business ops, and coding agents.
Private beta is open: link in the comments
“Apple I” feels exactly right.
The capability is real, but power users are still soldering together memory, tools, skills, MCPs, security, and workflows by hand.
We’re still waiting for the Macintosh moment for AI agents: a coherent, secure, well-packaged consumer product that makes all of this feel obvious.
Feels like it’s still anyone’s game.
On a more serious note, I was impressed by @browser_use every step of the way:
1. It uses @composio, which provides 980+ integrations covering most common websites. If I had let it use Google Maps, this would have been trivial.
2. It’s good at ad-hoc websites not covered by integrations too, including canvas-based ones that are usually harder for browser agents. It one-shotted a pixelated Elon Musk.
3. It makes smart tradeoffs when the exact path is expensive. I asked it to find homes within 15 min of X and 30 min of Y. It didn’t brute-force every candidate through Maps by default; it used haversine bounds as a proxy and disclosed the limitation. Not exactly what I wanted, but as a builder I smiled at the intelligent way it “cheated.”
4. Lastly, when everything fails, it finds another way to get the job done.
Clearly well thought out and battle-tested. 100% with the team’s own judgment on finding something that doesn’t work: “Seriously, it’s hard.”
Finally found a way to break @browser_use:
Apple Maps.
Not because the UI was too hard, but because Apple Maps simply refused the browser.
What impressed me was what happened next. It didn’t just fail. It worked around the surface, pivoted to MapKit JS Directions API, and still got me the answer.
This is the new UI paradigm in miniature: not command execution, but intent completion.
Well done.
Btw, let me know if the Mac mini bounty is still alive :)
Trying to claim my Mac mini, I realized @browser_use has effectively shipped 980 integrations across the web 😏 no wonder it’s hard to find a task that doesn’t work!
Curious: Are those integrations built by humans or Browser Use agents? The former is already valuable, but the latter would be wild.
@yugu_nlp 100%.
Workflow:
1. GPT Image 2 generates the slide concepts
2. Split each slide out
3. Claude converts screenshots into editable slides
4. Final pass to unify templates/styles
A good harness should do all of this. Without it, the human becomes the harness.
GPT Image 2 is great at generating slides, but mostly as one big image.
It’s still extremely manual to turn that into a real deck: separate slides, editable text, consistent components.
The hard part of image gen is done. Still a lot of value in the harness around it.
I will talk about 'continual learning as adaptive compression of experience' at the recursive self-improvement workshop at #ICLR2026.
Happening in ~20 mins.
Unfortunately I didn't make it to Rio, so it will be online.
https://t.co/oVCorcERBG
Is it a dark UX pattern that @claudeai doesn't allow downloading Chat Project files?
All I wanted was to move files to Claude Cowork / Code so I don't have to keep deleting old versions manually. I'm not leaving (yet)!
Insane to have this limitation in 2026.
Congrats on the launch of @NeoCognition!
@ysu_nlp and team are leading researchers in agents, tackling the fundamental challenge of enabling agents to self-learn toward expert-level intelligence.
Expertise drives reliability and efficiency, which are key barriers to broader deployment today!
More coverage -- NeoCognition raises $40M seed to build AI agents that specialise through experience rather than pre-training https://t.co/1F3s4zPCGk via @thenextweb