Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. https://t.co/2zX5bHdhsa
This is impressive: it is a problem I had actually heard of. It looks like the solution approach is surprising to mathematicians. It was a general reasoning model rather than a specialized one: bitter lesson time. I think the stochastic parrot is now nuked from orbit.
it’s in gemini, just create it in ai studio. oh, that’s for your personal google one account. for workspace you need gemini business. no, not gemini advanced, that’s ai pro now. unless you need ai ultra. oh agents? you do that in spark actually. no, not gemini api managed agents, that’s different. for coding use jules. unless you mean the agentic ide, that’s antigravity. no, that’s the old antigravity, download the new one. actually gemini cli is being deprecated, use antigravity cli. no the flash model is smarter than the pro model. unless you need pro. if it’s video, use flow. no, flow uses veo. no, nano banana is images. actually that’s in gemini now. unless you’re in search, then it’s ai mode. no, research is notebooklm. anyway it’s all very simple.
A big pivot from Ken Griffin on AI:
“Number one is, in the last few months, there has been a step change in the productivity of the AI toolkit. It is profoundly more powerful than it was just nine months ago.
And for us at Citadel, that has allowed us to unleash a much broader array of use cases for AI. And it has been really interesting to watch, to be blunt, work that we would usually do with people with masters and PhDs in finance over the course of weeks or months being done by AI agents over the course of hours or days.
These are not these are not mid-tier white collar jobs. These are like extraordinarily high skilled jobs being, I'm going to pick a word, automated by agentic AI. And I gotta tell you, I went home one Friday actually fairly depressed by this because you could just see how this was going to have such a dramatic impact on society.
When you witness it in your own four walls, when you see work that used to be man years of work being done in days or weeks, it's like, wow, like that's the first time I've seen real impact in our four walls.”
This echoes my own experience with agents and the conversations I am having with students, friends & clients. The toolkit has dramatically transformed and it feels like in finance, for the first time, AI is real.
My current list of "laws" governing computer design
I miss any ?
Rents Rule
Pollacks’s Rule
Amdahls Law
Moores Law
Dennard Scaling
Bitter lesson
Little’s Law
Jevon’s Paradox
POD-OF-ONE: THE NEW ORG BUILDING BLOCK
As a @coinbase board member, t’s been a privilege to watch @brian_armstrong@emiliemc, and the Coinbase team build a true AI-native company.
Brian's whole post is worth reading in depth. I want to focus in on one thing that Coinbase is testing: “one-person product teams.”
Most of the AI discourse has focused on one-person companies. The more powerful and more broadly applicable construct will likely be one-person teams inside companies.
The old product org split context across 3 people. The designer held the user experience. The PM held the customer and prioritization context. The engineer held the code and systems context. Coordination was the price you paid to combine those views into one shipping decision.
Agents reduce that coordination cost.
A single high-agency person can now ask agents to draft flows, write code, run QA, summarize customer feedback, generate variants, check edge cases, and produce release notes.
This model rewards a very specific kind of builder:
• Technical enough to inspect the work
• Product-minded enough to choose the right problem
• Tasteful enough to reject mediocre output
• Fast enough to ship before the org forms around the idea
The scarce skill is judgment.
One strong person with customer context and good taste can now do the work of a small pod. One weak person with agents just creates more output for someone else to review.
This changes how early-stage founders should hire.
The most useful hiring question is now: “Can this person own the outcome end-to-end?”
That’s a higher bar than a functional job description. It blends product sense, technical range, design taste, writing clarity, and operating discipline. The title matters less. The span matters more.
Call it pod-of-one thinking.
A pod-of-one builder can go from ambiguous customer pain to shipped v1 without waiting for specs, mocks, tickets, handoffs, or meetings. Agents fill in missing labor. The human carries the context.
Teams still matter. They should form when the surface area is real: multiple customer segments, production risk, complex GTM loops, or enough product depth that specialization pays for itself.
Before that, a pod-of-one may be the fastest shipping unit in the company.
Founders: hire people who can be pods-of-one, who can carry the whole problem in their head and use agents to increase their throughput.
Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights:
The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons:
1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing.
2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc.
3. LLM knowledge bases as an example of something that was *impossible* with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc.
I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3).
The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base *and* 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to...
Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors.
something i've noticed: AI agents create a weird new kind of burnout. esp for young people.
a lot of ambitious 22 year olds are going to think the answer is simple:
- spin up more agents
- ship more code
- sleep less
- outwork everyone
and for a while, it will feel incredible.
you can keep multiple agents running, feed them tasks, review outputs, fix mistakes, make decisions, and keep the whole loop moving.
the problem is that the work no longer drains you through typing. it drains you through judgment.
More attention.
More context switching.
More verification.
More decisions per hour.
so instead of 8-10 normal productive hours, you might get 4-5 extremely intense hours before your brain is fully cooked. and you feel numb until you sleep properly and reset
some of my friends are already burnt out. they don't say it out loud but i can tell.
the agent can keep working 24/7.
the human still has a hard limit
I would have expected the market to start discerning between SaaS that is impacted by AI, SaaS that needs to evolve, and SaaS that benefits from AI. Analytical SaaS, Creative SaaS is in category 1, System or Record, Human workflow and Engagement and Productivity are in category 2 and Infrastructure SaaS and Cybersecurity are in 3. This constant paranoid reaction of the market will continue to create buying opportunities for the discerning.
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
During the last week I executed very long autonomous sessions of Claude Code Opus 4.6 and Codex GPT 5.4 (both at max thinking budget), in cloned directories (refreshed every time one was behind). I burned a lot of (flat rate, my OSS free account + my PRO account) of tokens...
Some of the most underinvested areas in frontier biology that could accelerate civilizational progress:
- Cheap, large-scale DNA synthesis (writing entire chromosomes or full organisms)
- Real-time, non-destructive RNA sequencing in living cells
- Highly accurate AI-powered polygenic scores for complex traits (disease risk, cognition, longevity) → enabling full genome design
- Ultra-precise, multiplex genome editing (far beyond CRISPR) with minimal off-target effects, scalable across millions of cells
- Safe, efficient, tissue-specific in vivo delivery systems
- Safe and effective human germline engineering
- Accelerated clinical trials via testing on decedents (with consent)
- Next-gen human enhancement: muscle, cognition, mood — beyond GLP-1s
- Ectogenesis / artificial wombs
Who’s actually building in these areas? Drop names, companies, or researchers below 👇
New post on milestones of AI automation. Right now, human labor is a hard bottleneck on output (if you remove humans, output goes to 0). Soon we'll go from essential to important to helpful to useless, first in AI research and then across the AI stack. Link in next post.
When @karpathy built MenuGen (https://t.co/2OjrUJ3aLS), he said:
"Vibe coding menugen was exhilarating and fun escapade as a local demo, but a bit of a painful slog as a deployed, real app. Building a modern app is a bit like assembling IKEA future. There are all these services, docs, API keys, configurations, dev/prod deployments, team and security features, rate limits, pricing tiers."
We've all run into this issue when building with agents: you have to scurry off to establish accounts, clicking things in the browser as though it's the antediluvian days of 2023, in order to unblock its superintelligent progress.
So we decided to build Stripe Projects to help agents instantly provision services from the CLI.
For example, simply run:
$ stripe projects add posthog/analytics
And it'll create a PostHog account, get an API key, and (as needed) set up billing.
Projects is launching today as a developer preview. You can register for access (we'll make it available to everyone soon) at https://t.co/1tSgGbSLxM. We're also rolling out support for many new providers over the coming weeks. (Get in touch if you'd like to make your service available.)
https://t.co/vjRymcVCKI
Famously (there is a beautiful Works in Progress piece on this) in 2016, Geoffrey Hinton told an audience in Toronto that medical schools should stop training radiologists, since AI would soon outperform them at reading scans. Ten years later, there are more radiologists than ever, and they earn more than they did then.
Hinton was right about the task, but he was wrong (so far!) on the future of the radiology profession. Times have never been better for them. The gap between those two claims, the difference between tasks and jobs, is the subject of a paper I have written with Jin Li and Yanhui Wu, and that we release today: "Weak Bundle, Strong Bundle: How AI Redraws Job Boundaries." (Very relatedly we are also finishing the first draft of our book "Messy Jobs" on AI and Jobs!! You will be the first to hear).
We start from the observation that the growing literature on AI and labor markets measures the AI shock by task exposure: people count how many tasks AI can perform in a given occupation AI can perform, and infer that more exposure means more displacement. Eloundou et al. published a paper in Science in 2024 that started this literature, and many follow the same logic. The inference they make is that the more exposed tasks, the worse the outcomes.
This is incomplete, because labor markets price jobs, not tasks. A radiologist does not just sell image classification, but does many other jobs: triages cases, communicates with other physicians, trains residents, makes the difficult decisions, and signs a diagnosis. The market buys a bundled service. The question AI poses is not whether it can do one task inside the bundle. The question is whether that task can be pulled out.
Thread (1/3)
https://t.co/wEYMfjGbeX