in 2018, i wrote a problem for a college programming contest. i've been testing it against the latest frontier models for a while and it's turned out to be a pretty good benchmark: this is the first year models can one-shot it with meaningful reliability.
- leaderboard: https://t.co/FwfAz2yJLi
- try: https://t.co/k98CNhdqwJ
something we didn't expect from Subtext: given the choice between a local privacy filter (scrubs sensitive data on-device first, but takes a minute to set up) and just sending texts straight to the cloud: people picked local 9 to 1.
turns out the privacy/convenience tradeoff isn't much of a tradeoff. people will wait the extra minute.
so we made that minute painless: local setup is now one-click. if you use iMessage and got a Mac, try it: https://t.co/qc9dvMhu9d
stack: designed with @claudeai Design. local PII scrubbing on-device (names, numbers, addresses) before anything goes to the cloud using @openai's privacy filter open-weight model. Apple Silicon only for now.
credit to @helloitsaustin for the idea. happy to open source if there's interest
saw this and thought it would be a cool demo for what local models can do. cooked up Subtext as an evening project: a small Mac app that turns your iMessage history with anyone into a Spotify Wrapped-style slideshow.
https://t.co/UobHnUUcNm
I got married this past weekend so I did what any rational @AnthropicAI employee would do and had Claude Code analyze 12 years of iMessages with my wife, then Claude Design used that data to whip up a website for our guests in just minutes.
I've said this many times and I'll say it again: reactive AI is the agricultural age of AI. Useful, but still mostly manual labor.
I don't care if you type, use wispr flow, or command your claude code with telepathy. If you initiate every interaction, it's still reactive.
The industrial revolution begins with proactive-first agents. The end state is agents connected to your desktop + cloud context, quietly learning what you write, read, ignore, follow up on, defer, and keep coming back to.
They'll learn your priorities, goals, workflows, and history over time.
with the state of both openclaw and hermes being hit and miss, i understand if you are frustrated and want to give up on agents
HOWEVER!! the concept is not going anywhere, it's only gonna become better and more valuable.
you DON'T have to use them right now, but you can still proceed with doing 4 things:
#1: craft skills for most things in your life (email, calendar, doctor appointments, amazon, grocery shopping, managing contractors, etc etc)
#2: move as much data as you can from cloud providers and move to local md files, sqlite databases, NAS, etc etc.
#3: define and write down your problems, ambitions, goals, app ideas, income, bank accounts, bank transactions, investments, stocks, things you need to do, things that are preventing you from living the life you want, etc etc
#4: let llms interview you daily and learn about you, just random questions about your personal and work life. build a wiki from it or keep it in markdown files, whatever
you can still leverage the skills in codex/claude etc etc and as OC/hermes/whatever comes next is ready
and when the agents get smarter, all the 4 points will come together and your life will be on autopilot
i'm doing this since december and haven't stopped ✌️
All this manual effort to make agents "understand you" will be a thing of the past very soon. For lack of a better word, it just feels very "agricultural".
The real unlock is proactive desktop agents connected to both your desktop and cloud context. What you write. What you read. What you ignore. What you follow up on. What keeps coming back.
They will quietly build a model of you over time.
@davidpantera_ "average user" depends on who the ICP is. If your ICP is "everyone with a laptop", probably. If it's professionals who miss 2 important things every week because they have so much going on, it's a different story. We've seen this to be true over and over again.
This is why ambient desktop context is so powerful in building truly useful personalization. Your computer already captures what actually matters: what you write, what you read, what you act on, what you ignore, how much time you spent on every app, right down to the millisecond.
That signal is far richer (and more truthful). It’s first-party and continuous. It's legal because it's not scraping.
The hard part is processing that context securely, whittling out all the noise to get to what really matters. We've spent months on this and we’ve already seen it work with real users. Once you get it right, the system starts making useful inferences pretty quickly. It's kinda amazing.
Excited for more people to experience it when the Logical GA drops.
Founders pitching personalized AI agents almost always arrive at the same line about a "data moat" that gets built one user at a time. Then you ask them how they get the first 500 quality data points on me, and the answer is some version of: scrape my X, my Spotify, my DoorDash, my camera roll, my emails, my texts. I sat through dozens of these pitches when I was at a venture firm and I never bought it.
The first reason is legal. Most of those companies do not want you scraping their data.
The second reason is the part founders will not admit to investors. Trying to figure out who someone is from the tweets they liked at two in the morning is not very indicative of their personhood.
The actual move is to find a use case where users tell you the truth about themselves on purpose. There are not many of those, which is part of why dating is one of the first places AI personalization will work.
What you're missing is that how much people care is a function of how good the personalization and UX actually is. MS Recall failed because it didn't really deliver a transformative experience relative to the access it was requesting.
The difference is making someone's life 10x better, as opposed to 1.5x.
There’s a principled answer and a practical answer.
The principled one: take it seriously. Process as much as possible on-device, sanitize before anything leaves the machine, store all user data locally, and give people explicit control over which apps/sites are never monitored. Ambient context only works if users trust the system.
The real answer: if the personalization is good enough most people just don't care. How much people care is inversely proportional to how good the personalization is.
Which consumer AI products have genuinely broken through the SF/SV tech bubble?
The big tech chatbots for sure. Maybe Perplexity? It seems genuinely hard to get this right.
The biggest unlock will be if the legacy OSes expose agent interfaces.
Computer use sucks because an AI is being asked to use an interface designed for humans.
People talk about agent-firsr OSes but as you pointed out, a lot of these high value industries will be glacial in their adoption.
@jiriknesl You have to accept that you cannot do the same depth of human-driven reviews. Use agentic coding to bolster your test suite and use code review agents to make up the difference.
If you have a channel-per-topic norm in Slack, that definitely imparts a lot more structure.
I’ve just seen the flip side where a lot of the actual decision-making ends up scattered across threads/replies, and you have to reconstruct it. NVIDIA had pretty solid email hygiene, so big decisions often happened in email threads with all the context in one place. There's close to zero noise in those threads. People are much more intentional and calculated in what they say. From an agentic standpoint, email generally ends up being more token-efficient too.
Slack has better declared structure for sure. Probably why neither alone feels quite right to me (and therefore both have their place).
Building reactive AI systems for desktop these days is just plain stupid. the real unlock is gonna be in proactive AI. Proactive AI is hard, because it requires deep, continuous context. The good new is that the desktop is where that context interchange happens. That’s the direction we’re pushing with @trylogical