Many people think any given ML project is 99% training.
In reality, it’s 50% evaluation, 40% data cleaning, 8% integration, and 2% training.
The first two set the noise floor for learning. No ML magic matters; the model cannot lower the noise floor, as that’s the optimal bound of Shannon encoding of your data.
Thus, not a single day goes by without me thinking about ontology. Even the old labels have to be constantly reviewed.
this is actually insane
> be tech guy in australia
> adopt cancer riddled rescue dog, months to live
> not_going_to_give_you_up.mp4
> pay $3,000 to sequence her tumor DNA
> feed it to ChatGPT and AlphaFold
> zero background in biology
> identify mutated proteins, match them to drug targets
> design a custom mRNA cancer vaccine from scratch
> genomics professor is “gobsmacked” that some puppy lover did this on his own
> need ethics approval to administer it
> red tape takes longer than designing the vaccine
> 3 months, finally approved
> drive 10 hours to get rosie her first injection
> tumor halves
> coat gets glossy again
> dog is alive and happy
> professor: “if we can do this for a dog, why aren’t we rolling this out to humans?”
one man with a chatbot, and $3,000 just outperformed the entire pharmaceutical discovery pipeline.
we are going to cure so many diseases.
I dont think people realize how good things are going to get
i can't believe nobody caught this.
Anthropic's entire growth marketing team was just ONE PERSON
(for 10 months, confirmed)
a single non-technical person ran paid search, paid social, app stores, email marketing, and SEO for the $380B company behind claude
here's exactly how one human is doing the job of a full marketing team:
it starts with a CSV.
1. he exports all his existing ads from his ad platforms along with their performance metrics (click-through rates, conversions, spend, etc)
2. feeds the whole file into claude code
3. and tells it to find what's underperforming.
claude analyzes the data, flags the weak ads, and generates new copy variations on the spot
this is where he gets clever:
he then splits the work into 2 specialized sub-agents:
1. one that only writes headlines (capped at 30 characters)
2. and one that only writes descriptions (capped at 90 characters).
each agent is tuned to its specific constraint so the quality is way higher than cramming both into a single prompt
so now he's got hundreds of fresh headlines and descriptions.
but that's just the text.
he still needs the actual visual ad creative, the images and banners that go on facebook, google, etc.
so he built a figma plugin that:
1. takes all those new headlines and descriptions
2. finds the ad templates in his figma files
3. and automatically swaps the copy into each one.
up to 100 ready-to-publish ad variations generated at half a second per batch.
what used to take hours of duplicating frames and copy-pasting text by hand
so now the ads are live.
the next question is which ones are actually working.
for that he built an MCP server (basically a custom integration that lets claude talk directly to external tools) connected to the meta ads API.
so he can ask claude things like:
• "which ads had the best conversion rate this week"
• or "where am i wasting spend"
and get real answers from live campaign data without ever opening the meta ads dashboard
and the part that ties it all together and closes the loop:
he set up a memory system that logs every hypothesis and experiment result across ad iterations.
so when he goes back to step one and generates the next batch of variations...
claude automatically pulls in what worked and what didn't from all previous rounds.
the system literally gets smarter every cycle.
that kind of systematic experimentation across hundreds of ads would normally need a dedicated analytics person just to track
the numbers from the doc:
ad creation went from 2 hours to 15 minutes. 10x more creative output.
and he's now testing more variations across more channels than most full marketing teams
a $380 billion company.
and their entire growth marketing operation (not GTM) = just one person and claude code lol
truly unbelievable
Even the best developer tools mostly still don't let you sign up for an account via API. This is a big miss in the claude code age because it means that claude can't sign up on its own.
Putting all your account management functions in your API should be tablestakes now.
Last quarter I rolled out Microsoft Copilot to 4,000 employees.
$30 per seat per month.
$1.4 million annually.
I called it "digital transformation."
The board loved that phrase.
They approved it in eleven minutes.
No one asked what it would actually do.
Including me.
I told everyone it would "10x productivity."
That's not a real number.
But it sounds like one.
HR asked how we'd measure the 10x.
I said we'd "leverage analytics dashboards."
They stopped asking.
Three months later I checked the usage reports.
47 people had opened it.
12 had used it more than once.
One of them was me.
I used it to summarize an email I could have read in 30 seconds.
It took 45 seconds.
Plus the time it took to fix the hallucinations.
But I called it a "pilot success."
Success means the pilot didn't visibly fail.
The CFO asked about ROI.
I showed him a graph.
The graph went up and to the right.
It measured "AI enablement."
I made that metric up.
He nodded approvingly.
We're "AI-enabled" now.
I don't know what that means.
But it's in our investor deck.
A senior developer asked why we didn't use Claude or ChatGPT.
I said we needed "enterprise-grade security."
He asked what that meant.
I said "compliance."
He asked which compliance.
I said "all of them."
He looked skeptical.
I scheduled him for a "career development conversation."
He stopped asking questions.
Microsoft sent a case study team.
They wanted to feature us as a success story.
I told them we "saved 40,000 hours."
I calculated that number by multiplying employees by a number I made up.
They didn't verify it.
They never do.
Now we're on Microsoft's website.
"Global enterprise achieves 40,000 hours of productivity gains with Copilot."
The CEO shared it on LinkedIn.
He got 3,000 likes.
He's never used Copilot.
None of the executives have.
We have an exemption.
"Strategic focus requires minimal digital distraction."
I wrote that policy.
The licenses renew next month.
I'm requesting an expansion.
5,000 more seats.
We haven't used the first 4,000.
But this time we'll "drive adoption."
Adoption means mandatory training.
Training means a 45-minute webinar no one watches.
But completion will be tracked.
Completion is a metric.
Metrics go in dashboards.
Dashboards go in board presentations.
Board presentations get me promoted.
I'll be SVP by Q3.
I still don't know what Copilot does.
But I know what it's for.
It's for showing we're "investing in AI."
Investment means spending.
Spending means commitment.
Commitment means we're serious about the future.
The future is whatever I say it is.
As long as the graph goes up and to the right.
A new CMU/Stanford study observed humans and AI agents completing the same real tasks across data analysis, writing, engineering, computation, and design.
They found that agents don’t work like humans. Humans solve tasks visually and interactively: open files, scan, verify, adjust, compare, iterate. Agents, on the other hand, convert work into code. Across domains, they took this programmatic path ~94% of the time.
That difference matters: Agents excel at highly programmable, deterministic steps, while humans remain more accurate on less programmable work.
This is why the future of work centers on human/AI “teaming” rather than agents replacing humans:
- Agents handle the programmable steps.
- Humans handle perception, interpretation, and judgment.
The researchers found that teaming humans and agents based on their specific strengths delivered ~68% higher efficiency than humans working alone, while still maintaining task accuracy.
This is where enterprise AI is heading: re-architecting work so humans and agents amplify each other. Not automation replacing people, but workflows redesigned so each does what they’re uniquely good at.
Agent evals are a completely new ball game.
Right now most AI evals operate within a self contained world of the model. This will remain critical for ensuring we get continued improvements in domain-specific skills and general reasoning capabilities.
But agents necessarily will have access to a world that is thousands or millions of times larger than the model parameters. Knowledge sources, software tools, the web, and so on. Thus, knowing how to interact with that world will be the new thing we need to train and eval against.
You can already feel this today by giving an agent many MCP tools, and watching it and going “oh god no don’t use that tool”, and seeing it go down an entirely different route than you wanted.
This is another reason why coding agents have worked so well out of the gate. The data is generally highly accessible and the tools are relatively standardized. This is not true for most fields of knowledge work.
We are only in the earliest innings of how we test and train models for these highly complex environments. But this is where the action will be in the coming years.
Not talked about enough: products live or die depending on the *feeling* of power they give to the user.
Too complex and users don’t feel powerful, they feel stupid and overwhelmed.
Too simple and users don’t feel powerful, they feel frustrated and boxed in.
The biggest Project of my life ‘The Run Around The World’ is just 14 months away now. My team has already been working on this and here are some crazy and mind boggling numbers they shared with me.
The Numbers that make this project unthinkable
➡️ 40000 km - that’s 1 full loop of Earth (equal to The Earth equator)
➡️ 57 km per day, every single day, for 740 day or less (The Guinness Timeline)
➡️ 5 continents
➡️ 30+ International Borders
➡️ -20°C to +45°C temperature ranage (Alaska and Saudi Arabia)
➡️ Every weather
➡️ Every terrain
➡️ Almost every Time Zone
And for the perspective that blew me away:
🏔️ 7,200+ people have summited Mount Everest
🌕 12 humans have walked on the Moon
👨🚀 700+ people have been to space
But No one has ever run the circumference of 🌍
So let’s Run the World!
#RunAroundTheWorld
We explain everything by what is visible, sophisticated, and recent, while the truth is found in what’s invisible, inscrutable, and predetermined. This may be our most fundamental thinking bias.
we were right to expect AI to be a massive productivity booster
we were wrong to expect AI to be low friction
un-incentivised users (which is most of us for most usecases) will never persist through the friction to get to the productivity
As a blusmart consumer, I don't care wtf is wrong with management. Service is unmatchable. Uber comes nowhere close.
Hope the brand and operating model survives through this.
This is Richard Morgan.
He's a 93-year-old four time world rowing champion who has the fitness levels of a 40 year old.
Scientists studied him to find out his secrets to delaying the aging process.
Here's what they found:
One key lesson I've learned from seasoned founders, who are 20-25 yrs my senior, is the power of simply saying yes to those you share a strong bond with—without overanalyzing or calculating too much when they ask for something. Relationships matter.