Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. https://t.co/2zX5bHdhsa
PepsiCo just closed two Frito-Lay plants. 430 workers in California and 454 in Florida. Campbell's shut its chip factory in Massachusetts. Smucker took a nearly $1 billion loss on Twinkies. Hershey's confectionery volumes fell 5% in a single quarter. The reason is one molecule. Semaglutide. I manufacture chemicals. 23% of American households now have a GLP-1 user. Each one consumes roughly 800 fewer calories per day. J.P. Morgan estimates a $30-55 billion annual demand reduction in food. Nestle launched its first new American brand in three decades, built for people who eat less. The $12 trillion food industry employs 30% of the global workforce. A single molecule is restructuring what humans eat. The companies that ignored it got fried. Chemistry is always the bottleneck.
claude code is having it's cursor moment after karpathy sensei's post. never been a better time to try it.
my latest blog on how to get the most out of claude code 2.0 and other agents in general is up now. grab a chai and have fun reading!
https://t.co/cQpvo0xocY
11 Predictions for 2026
Every year I make a list of predictions & score last year’s predictions. 2025 was a good year : I scored 7.85 out of 10.
Here are my predictions for 2026 :
1. Businesses pay more for AI agents than people for the first time.
This has already happened with consumers. Waymo rides cost 31% more than Uber on average, yet demand keeps growing. 1 Riders prefer the safety & reliability of autonomous vehicles. For rote business tasks, agents will command a similar premium as companies factor in onboarding, recruiting, training, & management costs.
2. 2026 becomes a record year for liquidity.
SpaceX, OpenAI, Anthropic, Stripe, & Databricks IPO, with SpaceX & OpenAI ranking among the ten largest offerings ever. The pent-up demand from 4+ years of drought finally breaks. Fear of disruption by fast-growing AI systems drives defensive acquisitions exceeding $25b as incumbents buy rather than build.
3. Vector databases resurge as essential infrastructure in the AI stack.
Multimodal models & world/state-space models demand new data architectures. Vector databases grow revenue explosively as they become the connective tissue between foundation models & enterprise data.
4. AI models execute tasks autonomously for longer than a workday.
According to METR, AI task duration doubles every 7 months. 2 Current frontier models reliably complete tasks taking people about an hour. Extrapolating this trend, by late 2026, AI agents will autonomously execute 8+ hour workstreams, fundamentally changing how companies staff projects.
5. AI budgets receive scrutiny for the first time.
Buying committees & boards push back on AI spend. Small language models & open-source alternatives rise in popularity as research labs determine how to specialize them for particular tasks, achieving state-of-the-art performance at a fraction of the cost. Developers prefer them for 10x cost reductions.
6. Google distances itself from competitors via breadth in AI.
No other company achieves breakthroughs across as many domains : frontier models, on-device inference, video generation, open-source weights, & search integration. Google sets the pace, forcing OpenAI, Anthropic, & xAI to specialize in response. The era of every lab competing on every frontier ends.
7. Agent observability becomes the most competitive layer of the inference stack.
Engineering observability, security observability, & data observability fuse into a single discipline. Agents require unified visibility across code execution, threat detection, & data lineage. This marks the beginning of the confluence I predicted in 2025 : the three observability spaces finally converge.
8. 30% of international payments are issued via stablecoin by December.
The efficiency gains in cross-border settlement are too large to ignore. As regulatory clarity improves in major markets, stablecoins move from the periphery of crypto to the core of global trade finance, displacing traditional SWIFT rails for a significant portion of B2B volume.
9. Agent data access patterns stress & break existing databases.
Agents issue at least an order of magnitude more queries to databases & data lakes than people ever did. This surge in concurrency & throughput requirements forces a redesign of the overall architecture for both transactional & analytical databases to handle the relentless demand of autonomous systems.
10. The data center buildout reaches 3.5% of US GDP in 2026.
The scale of investment mirrors the historical expansion of the railroads. The only factor that slows overall building is perceived risk within the credit market, particularly in the private credit market. The massive growth in that asset class suddenly shows strains of increasing default rates, creating a potential bottleneck for the most capital-intensive infrastructure projects.
11. The web flips to agent-first design.
Most developer documentation & many websites become agent-first rather than people-first. This shift occurs because many purchasing decisions are now informed first through agentic research. Consequently, the front door needs to be designed for robots, while the side door caters to people.
TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: https://t.co/nZ5JQ19CN6
we are excited to make this a very, very good model!
__
we are planning to release our first open-weigh language model since GPT-2.
we’ve been thinking about this for a long time but other priorities took precedence. now it feels important to do.
before release, we will evaluate this model according out our preparedness framework, like we would for any other model. and we will do extra work given that we know this model will be modified post-release.
we still have some decisions to make, so we are hosting developer events to gather feedback and later play with early prototypes. we’ll start in SF in a couple of weeks followed by sessions in europe and APAC. if you are interested in joining, please sign up at the link above.
we’re excited to see what developers build and how large companies and governments use it where they prefer to run a model themselves.
🚀 Import your custom AI models into #AmazonBedrock! 🔗 https://t.co/zXtLdBaYSg
Get a hands-on demo in under a minute:
1️⃣ Access Bedrock, click 'Import model'
2️⃣ Set IAM permissions, start import
3️⃣ Access and use through the same API
🚀Our latest GenAI feature—Custom Model Import—is generally available on Amazon Bedrock today: https://t.co/EkeYWBJmOz
This new capability allows customers to seamlessly import and use their customized models (like Meta Llama or Mistral Mixtral) so they don’t have to start from scratch when they begin building in Amazon Bedrock. Builders can enjoy the power of our fully managed service in a severless manner that removes the heavy lifting of managing infrastructure or model lifecycle tasks. Check out our latest blog to learn how to get started.
Transformer by Hand✍️
To study the transformer architecture, it is like opening up the hood of a car and seeing all sorts of engine parts: embeddings, positional encoding, feed-forward network, attention weighting, self-attention, cross-attention, multi-head attention, layer norm, skip connections, softmax, linear, Nx, shifted right, query, key, value, masking. This list of jargons feels overwhelming!
What are the key parts that really make the transformer (🚗) run?
In my opinion, the 🔑 key is the combination of: [attention weighting] and [feed-forward network].
All the other parts are enhancements to make the transformer (🚗) run faster and longer, which is still important because those enhancements are what lead us to "large" language models. 🚗 -> 🚚
Walkthrough
[1] Given
↳ Input features from the previous block (5 positions)
[2] Attention
↳ Feed all 5 features to a query-key attention module (QK) to obtain an attention weight matrix (A). I will skip the details of this module. In a follow-up post I will unpack this module.
[3] Attention Weighting
↳ Multiply the input features with the attention weight matrix to obtain attention weighted features (Z). Note that there are still 5 positions.
↳ The effect is to combine features across positions (horizontally), in this case, X1 := X1 + X2, X2 := X2 + X3....etc.
[4] FFN: First Layer
↳ Feed all 5 attention weighted features into the first layer.
↳ Multiply these features with the weights and biases.
↳ The effect is to combine features across feature dimensions (vertically).
↳ The dimensionality of each feature is increased from 3 to 4.
↳ Note that each position is processed by the same weight matrix. This is what the term "position-wise" is referring to.
↳ Note that the FFN is essentially a multi layer perceptron.
[5] ReLU
↳ Negative values are set to zeros by ReLU.
[6] FFN: Second Layer
↳ Feed all 5 features (d=3) into the second layer.
↳ The dimensionality of each feature is decreased from 4 back to 3.
↳ The output is fed to the next block to repeat this process.
↳ Note that the next block would have a completely separate set of parameters.
Together, the two key parts: attention and FFN, transform features both across positions and across feature dimensions. This is what makes the transformer (🚗) run!
Why AI Won't Cause Unemployment
"In retrospect, I wish I had known more about the hazards and difficulties of [running] a business." -- George McGovern
Fears about new technology replacing human labor and causing overall unemployment have raged across industrialized societies for hundreds of years, despite a nearly continual rise in both jobs and wages in capitalist economies. The job apocalypse is always right around the corner; just ask the Luddites.
We had two such anti-technology jobs moral panics in the last 20 years — “outsourcing” enabled by the Internet in the 2000’s, and “robots” in the 2010’s. The result was the best national and global economy in human history in pre-COVID 2019, with the most jobs at the highest wages ever.
Now we’re heading into the third such panic of the new century with AI, coupled with a continuous drumbeat of demand for Communist-inspired Universal Basic Income. “This time is different; AI is different,” they say, but is it?
Normally I would make the standard arguments against technologically-driven unemployment. And I will come back and make those arguments soon. But I don’t even think the standand arguments are needed, since another problem will block the progress of AI across most of the economy first.
Which is: AI is already illegal for most of the economy, and will be for virtually all of the economy.
How do I know that? Because technology is already illegal in most of the economy, and that is becoming steadily more true over time.
How do I know that? Because, see the chart.
This chart shows price changes, adjusted for inflation, across a dozen major sectors of the economy.
As you can see, we actually live in two different economies.
The lines in blue are the sectors where technological innovation is allowed to push down prices while increasing quality. The lines in red are the sectors where technological innovation is not permitted to push down prices; in fact, the prices of education, health care, and housing as well as anything provided or controlled by the government are going to the moon, even as those sectors are technologically stagnant.
We are heading into a world where a flat screen TV that covers your entire wall costs $100, and a four year college degree costs $1 million, and nobody has anything even resembling a proposal on how to fix this.
Why? The sectors in red are heavily regulated and controlled and bottlenecked by the government and by those industries themselves. Those industries are monopolies, oligopolies, and cartels, with extensive formal government regulation as well as regulatory capture, price fixing, Soviet style price setting, occupational licensing, and every other barrier to improvement and change you can possibly imagine. Technological innovation in those sectors is virtually forbidden now.
Whereas the sectors in blue are less regulated, technology whips through them, pushing down prices and raising quality every year.
Note the emotional loading of the interplay of production and consumption here. What do we get mad about? With our consumer hat on, we get mad about price increases — the red sectors. With our producer hat on, we get mad about technological disruption — the blue sectors. Well, pick one; as this chart shows, you can’t have your cake and eat it too.
Now think about what happens over time. The prices of regulated, non-technological products rise; the prices of less regulated, technologically-powered products fall. Which eats the economy? The regulated sectors continuously grow as a percentage of GDP; the less regulated sectors shrink. At the limit, 99% of the economy will be the regulated, non-technological sectors, which is precisely where we are headed.
Therefore AI cannot cause overall unemployment to rise, even if the Luddite arguments are right this time. AI is simply already illegal across most of the economy, soon to be virtually all of the economy.
@tomgoldsteincs No doubt it’s costly but this isn’t a simple marginal cost exercise. The publicity and more importantly, the human in loop training (or rlhf) is valuable.
Great to see a new open-source API stack for modern data apps! Today @DataStax shipped Document, GraphQL, and REST APIs for @Cassandra in #Astra + @stargateio! This will help more #devs build on the powerful #Cassandra. More details here → https://t.co/Lo0uVhWRAD