Neural networks might speak English, but they think in shapes.
Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision.
Starting today, we’re releasing a series of posts on this research agenda. 🧵
notes
The general loop is do good work and then tell people about it. Example (4M views https://t.co/HONnO7qpCW - 1-2 months of work, 2 days of writing)
Sowing:
- doing good work, even in a niche, is interesting to people not in the niche. (example: best pokemon player talking about his craft)
- use [TOOL] for everything, solve and find all problems
- talk to customers who were using [TOOL] inefficiently. realized that there were basic things they werent doing right, and it felt too basic to write but it reached a wide audience because he solved a pain point that customers had but employees maybe wouldnt experience
Reaping:
1. Tell a story. (not simply "how we built x", "how to do x" - more informative/subtle than that)
2. Keep it as simple as possible
3. Don't sell something you don't believe in - if you dont use it, dont write about it.
4. Share secrets - some "real alpha" that you're even a little scared to reveal to competitors
misc:
- Q: How do you write? A: very unstructured... "you have nothing until you have everything", range from
- Q: how much time do you spend writing for work? A: very spiky - one week a lot, one week just heads down engineering, aim for 50-50 sowing-reaping
- Q: how do you make diagrams? A: Claude Design - make 5 different SVGs, make a concept and choose one. diagrams, code snippets. use principles of claude design, give design system and references.
- Q: how much time you reading articles? too much marketing slop. "i read our slack alot".
- Q: getting feedback before posting - use ABCD feedback framework.
- Q: series - Lessons from Claude Code
- make sales collateral
- Q: did u make something more / less popular than expected? -> the skills post
TLDR: Build implicit knowledge, and then make it explicit.
i can't believe more people aren't talking about this part of the claude code leak
there's a hidden feature in the source code called KAIROS, and it basically shows you anthropic's endgame
KAIROS is an always-on, *proactive* Claude that does things without you asking it to.
it runs in the background 24/7 while you work (or sleep)
anthropic hasn't turned it on to the public yet, but the code is fully built
here's how it works:
every few seconds, KAIROS gets a heartbeat.
basically a prompt that says "anything worth doing right now?"
it looks at what's happening and makes a call: do something, or stay quiet
if it acts, it can fix errors in your code, respond to messages, update files, run tasks...
basically anything claude code can already do, just without you telling it to
but here's what makes KAIROS different from regular claude code:
it has (at least) 3 exclusive tools that regular claude code doesn't get:
1. push notifications, so it can reach you on your phone or desktop even when you're not in the terminal
2. file delivery, so it can send you things it created without you asking for them
3. pull request subscriptions, so it can watch your github and react to code changes on its own
regular claude code can only talk to you when you talk to it. KAIROS can tap you on the shoulder
and it keeps daily logs of everything.
> what it noticed
> what it decided
> what it did
append-only, meaning it can't erase its own history (you can read everything)
at night it runs something the code literally calls "autoDream."
where it consolidates what it learned during the day and reorganizes its memory while you sleep
and it persists across sessions. close your laptop friday, open it monday, it's been working the whole time
think about what this means in practice:
> you're asleep and your website goes down. KAIROS detects it, restarts the server, and sends you a notification. by the time you see it, it's already back up
> you get a customer complaint email at 2am. KAIROS reads it, sends the reply, and logs what it did. you wake up and it's already resolved
> your stripe subscription page has a typo that's been live for 3 days. KAIROS spots it, fixes it, and logs the change
endless use-cases, it's essentially a co-founder who never sleeps
the codebase has this fully built and gated behind internal feature flags called PROACTIVE and KAIROS
i think this is probably the clearest signal yet for where all ai tools are going.
we are heading into the "post-prompting" era
where the ai just works for you in the background
like an all-knowing teammate who notices and handles everything, before you even think to ask
Are RL environments for sciences different from that of let's say SWE?
In image, audio, video and prose gen the compounding effects of AI slop is ignorable. But in sciences it compounds to making the entire study useless.
Great read for anyone working in AI for science. While many in this space are building fine tuned models and harnesses, the verifiability of the intermediate steps are the biggest bottleneck.
How would you build an RL environment for physics, chemistry?
We’re launching with two new posts.
Can AI do theoretical physics?
Harvard physicist Matthew Schwartz led Claude Opus 4.5 through a graduate-level calculation. AI can’t yet do original work autonomously, but it can vastly accelerate it.
Read more: https://t.co/UUfFuLqhb7
Introducing TigerFS - a filesystem backed by PostgreSQL, and a filesystem interface to PostgreSQL.
Idea is simple: Agents don't need fancy APIs or SDKs, they love the file system. ls, cat, find, grep. Pipelined UNIX tools. So let’s make files transactional and concurrent by backing them with a real database.
There are two ways to use it:
File-first: Write markdown, organize into directories. Writes are atomic, everything is auto-versioned. Any tool that works with files -- Claude Code, Cursor, grep, emacs -- just works. Multi-agent task coordination is just mv'ing files between todo/doing/done directories.
Data-first: Mount any Postgres database and explore it with Unix tools. For large databases, chain filters into paths that push down to SQL: .by/customer_id/123/.order/created_at/.last/10/.export/json. Bulk import/export, no SQL needed, and ships with Claude Code skills.
Every file is a real PostgreSQL row. Multiple agents and humans read and write concurrently with full ACID guarantees. The filesystem /is/ the API.
Mounts via FUSE on Linux and NFS on macOS, no extra dependencies. Point it at an existing Postgres database, or spin up a free one on Tiger Cloud or Ghost.
I built this mostly for agent workflows, but curious what else people would use it for. It's early but the core is solid. Feedback welcome.
https://t.co/IPhieopOSP
We should revive traditional clothing.
So many cultures have really cool traditions and manufacturing techniques shaped by local materials and climate.
Everything has become so homogenized now that many people don’t even know how their ancestors dressed.
I'm going to make some obvious points.
(1) Blowing up all the oil infrastructure in the Middle East is an insane idea, and may well result in a global economic crash and humanitarian crisis unrivaled in the lives of those now living. We're talking about the price of everything everywhere rising, from food to gas, at a moment when inflation was already high. All of that will be laid at the feet of the authors of this war.
(2) The antebellum status quo of Feb 27, 2026 was just not that bad, but we're unlikely to return to it. Expect indefinite, long-term, ongoing disruptions to everything out of the Middle East.
(3) Also assume tech financing crashes for the indefinite future. The genius plan to get the Gulf states caught in the crossfire has incinerated much of the funding for LPs, for datacenters, and for IPOs. Anyone in tech who supported this war may soon learn the meaning of "force majeure" as funding gets yanked.
(4) Many capital allocators will instead be allocating much further down Maslow's hierarchy of needs, towards useful basic things like food and energy.
(5) It's fortunate that all those progressives yelled about the "climate crisis." Yes, their reasoning about timelines was wrong, and much of the money was wasted in graft, but the result was right: we all need energy independence from the Middle East, pronto. It's also fortunate that Elon and China autistically took climate seriously. Now they're going to need to ship a billion solar panels, electric vehicles, batteries, nuclear power plants, and the like to get everyone off oil, immediately.
(6) It's not just an oil and gas problem, of course. It's also a fertilizer problem, and a chemical precursor problem. Maybe some new sources will come online at the new prices, but it takes time to dial stuff up, particularly at this scale, so shortages are almost a certainty.
That said, China has actually scaled up coal-to-chemicals[a,c] (C2C), and there's also something more sci-fi called Power-to-X[b] which turns arbitrary power + water + air into hydrocarbons. But all of that will need to get accelerated. I have a background in chemical engineering so may start funding things in this area.
(7) Ultimately, this war is going to result in tremendous blame for anyone associated with it. It's a no-win scenario to blow up this much infrastructure for so many people. Simply not worth it for whatever objective they thought they were going to attain. But unless you're actually in a position to stop the madness, the pragmatic thing to do is: scramble to mitigate the fallout to yourself, your business, and your people.
[a]: https://t.co/ITat4tmAFd
[b]: https://t.co/bWwiSQcgyt
[c]: https://t.co/FQCqMhy5d3
My version of this is don't stay in the path of the model, let it decide what it needs to do, just give it the tools.
If you're AGI pilled you'd fall back to letting the model do everything, opposed to you handholding it.
I see people at Anthropic who didn't necessarily start that way getting better at it. Part of it is being surrounded by others who are AGI-pilled + watching how they push the models. But ultimately...
1. Ask yourself: what if the exponential actually continues
2. Take a task and handhold the AI less, be more ambitious, try to do more of it end-to-end with AI
3. Do #2 enough until you reach the limits of current AI and it fails
4. Wait until the models get better and can successfully complete that task
5. Learn from this. Update your strategy. Rethink what the future looks like.
And practice that over & over
There are so many benchmarks for different aspects of SWE agents!!
Perhaps the reason why we have so much progress in agents for coding as opposed to other domains?
Being formally verifiable is difficult in many domains...