This is a diary entry to myself, so I remember what AI was like today.
It's just going to be a bullet-list stream of consciousness.
- There are still so many leaders that have never seen an agent run at work
- I asked a recent room (very tech curious but not engineers) how many people had built an agent and 80% raised their hands
- The biggest topic in Silicon Valley is a self-learning org
- The layoffs, particularly at Meta, are causing a lot of distrust among tech workers
- Social feed is filled with graduation speeches about AI. Speeches from Eric Schmidt/others that are pro-AI are getting loudly booed, and speeches from Ronny Chieng saying f*ck AI are getting light to heavy cheers
- Connecting tools into AI systems safely is still a big open question in the enterprise
- No one seems to care about Opus 4.8 launch but it’s only been 24 hours
-Avg engineer I speak with prefers Codex over Claude Code rn
- My feed is filled with more and more women showing how they use Claude
- Other than image generation use cases, I almost never see ChatGPT come up. A lot of people still mention it in person
- Perplexity is rarely mentioned these days, mostly by Gen X men
- Every CIO I meet with is worried about token maxxing and cost, they want to know where the signal is among the noise for AI usage
- Avg F500 enterprise is just now hearing about the hill climbing / flywheel / AI-legible company framework and don’t know what it is
- Superusers inside of enterprises that have changed the way they work are not incentivized to share anything out, so the best learnings of business transformation are not getting circulated
- Average CEO is still worried about messing up their AI strategy
- Majority of AI strategies happening in the enterprise sound like startup strategies at the end of 2024, makes sense bc enterprises are usually 2-3 years behind startups
- Lot of questions around governance and explainability, NLA work from Anthropic did not seem to make a big impact in my circles yet
- People are massively sleeping on the /goals functionality
- People are sleeping on kicking off AI tasks before you go to bed and having AI crank 24/7
- Seems to be low trust among coworkers of each other, particularly in the US, where it feels a little bit more like every man for himself
- People are just now starting to think through what the internet might need to look like for agents, I really like what Gary Tan and Dan Shipper have been building out
- X comments are more AI bots than ever
- Speed of release feels like it has slowed down slightly from a month or two ago, many of the things that are coming out feel like incremental orchestration releases that are all trying to support this Ralph Wiggum/constant loop that people are trying for
- Most people react negatively to the word harness
- Most performance questions I get are still on the models
- I still get nonstop questions about how people can best prepare their kids
1/2
Introducing Gemini Omni 🔮........ Omni is our new model that can create anything from any input — starting with video (think Nano Banana but for video). Available in the Gemini App, Flow, and YouTube, with API support coming soon!
Anthropic just went after the 44% of U.S. GDP that enterprise AI has mostly ignored.
Claude for Small Business launched this week with 15 prebuilt agentic workflows and 15 skills connected directly into QuickBooks, PayPal, HubSpot, Canva, Docusign, Google Workspace, and Microsoft 365.
It’s deployed in Claude Cowork and has no extra charge beyond existing subscriptions.
Some of the use cases shared in the announcement: payroll planning, invoice chasing, month-end close, cash-flow forecasting, marketing campaign creation, tax season organizer.
Anthropic also launched a 10-city free training tour with half-day AI fluency sessions for 100 SMB leaders per stop across Chicago, Tulsa, Dallas, New Jersey, Baton Rouge, Birmingham, Salt Lake City, Baltimore, San Jose, and Indianapolis.
Anthropic knows SMBs adopt AI slower than enterprise, so they’re proactively creating an intelligence layer with the software they’re already using. It’s a distribution bet.
(Also to whomever decided that the SMB demo should be a company with $17M in cash worried that they won’t make $65k payroll… I worry about you.)
HTML is the new markdown.
I've stopped writing markdown files for almost everything and switched to using Claude Code to generate HTML for me. This is why.
most growth marketers use AI to rewrite headlines and call it a day. here's how I actually use Claude on the growth marketing team at @AnthropicAI across chat, Claude Cowork, and Claude Code 👇
I've read this piece three times.
It should be canon. If you're designing for agents, design for the workflow of it and its users. Think through how your data should show up and be presented.
e.g. Slack displays ugly Markdown. Notion nails it.
Read this.
Organizational design for agents is hard, benchmarking agents working in concert is hard. Together, this is the next critical frontier for making AI matter in economically valuable tasks, and we really don’t know very much about it.
"It’s not a human move. I’ve never seen a human play this move. So beautiful."
Seoul, 2016. This is the story of AI and creativity that fascinates me.
Go is a game of intuition and infinite possibilities. There are more possible positions on a Go board than there are atoms in the universe and humans have played it for 2,500 years, developing rules of thumb about what looks "good" or "right".
DeepMind's AlphaGo was playing against world champion Lee Sedol. AlphaGo is a system that taught itself to win by playing millions of games against itself.
It had already won the first game and in the second game the machine faced an unusual board.
37th Move, AlphaGo placed its stone on the fifth line from the edge of the board, deep into what humans considered a "wrong move".
The first reaction from the public was "it's a mistake" In Go apparently you never play on the fifth line so early as it’s considered inefficient giving away too much territory.
Lee Sedol first stood up and walked out of the room to compose himself, then he spent nearly 15 minutes of his clock time just trying to understand why a machine would do something so "inhuman".
Why it was "Beautiful"?
Because it was original.
AlphaGo wasn't just searching a database of human games. It calculated that the probability of a human playing that move was 1 in 10,000. It played it anyway because its internal logic saw a long-term strategic advantage that no human in 2,500 years had perceived.
1️⃣ The machine didn't beat Lee Sedol by being a better calculator. It beat him by being more creative.
2️⃣ When the DeepMind engineers looked at the code, they couldn't explain why AlphaGo did it in human terms.
3️⃣ It proved that the "Connectionists" were right. By mimicking a brain, they hadn't just built a tool. They built something that could develop its own intuition.
Move 37 fascinates me because working with AI is about being open to reimagining ways of doing things that don't follow the "usual" path.
Often, a new approach looks like a mistake or feels "wrong" to our traditional training. In reality, it is about leveraging information to reach an objective in a reimagined way.
The goal isn't to make the machine follow our rules. The goal is to let it show us a better way.
Ah in the end AlphaGo won by 4-1 and some say it let Lee win one game for compassion
Opus 4.7 feels more intelligent, agentic, and precise than 4.6. It took a few days for me to learn how to work with it effectively, to fully take advantage of its new capabilities.
Will post a few more tips throughout the day, starting with this blog post: https://t.co/XQrH8P28yo
Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
This is an insane Anthropic tweet.
And it’s a *buried reply* to one of their other tweets.
I am reminded of a talk I gave ~2-3 months ago where a senior developer at a Fortune 500 company asked me “why would I use AI to code if I can just code myself.”
I answered.
He said, “But sometimes it messes up.”
I told him this was coming. Even if it’s not perfect today (it makes weird product features decisions sometimes, not gonna lie), the scaling laws seem to be holding up this year and the next iteration will be even more capable.
I wish I could send him this tweet.
- Drafted a blog post
- Used an LLM to meticulously improve the argument over 4 hours.
- Wow, feeling great, it’s so convincing!
- Fun idea let’s ask it to argue the opposite.
- LLM demolishes the entire argument and convinces me that the opposite is in fact true.
- lol
The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
I don’t think you all understand how powerful Claude Code is from a phone.
Telegram, iMessage, remote control, mobile command, dispatch, whatever you have to do.
Just get this thing on your phone.
My team is going to hold a no-laptop hack day soon, where everyone is forced to leave their desk and just be on a walk or at a restaurant or in a park with their phone (and dictation, I’m sure). Whoever is most productive wins.
“I am in this perpetual state of AI psychosis because there is a huge unlock to what you can achieve as an individual, trying to figure out what is possible, trying to push it to the limits”
The ability of the Claude team to learn from things like OpenClaw and implement features like this on a daily basis is a very strong argument that, for AI-powered coding teams, a very different software development process is possible, with large strategic implications.