Andrej Karpathy spent 2h showing how he actually uses AI day to day
he's a co-founder of OpenAI and led AI at Tesla, so when he shows how he works, it’s worth watching
and the whole session is just him telling the machine what he wants in simple terms, like he's briefing a coworker
watch what's actually happening the entire time:
> he describes the task in normal words
> it goes off and does the work
> he glances at the result and nudges it with one more sentence
that's the whole skill, and you've had it since you learned to talk
the only gap between that and a worker that runs on its own is handing that sentence a schedule and the tools to act
check his work, then build the version that keeps working when you stop
Excited to share how Anthropic's data team has automated 95% of business analytics queries with Claude. Blog post covers how we approach evals, ablations, and online validation!
The more enterprises I talk to about AI agent transformation, the more it’s clear that there is going to be a new type of role in most enterprises going forward. The job is to be the agent deployer and manager in teams. Here’s the rough JD:
This person will need to figure out what are the highest leverage set of workflows on a team are (either existing or new ones) where agents can actually drive significantly more value for the team and company.
In general, it’s going to be in areas where if you threw compute (in the form of agents) at a task you could either execute it 100X faster or do it 100X more times than before. Examples would be processing orders of magnitude more leads to hand them off to reps with extra customer signal, automating a contracting review and intake process, streamlining a client onboarding process to reduce as many straps as possible, setting up knowledge bases than the whole company taps into, and so on.
This person’s job is to figure out what the future state workflow needs to look like to drive this new form of automation, and how to connect up the various existing or new systems in such a way that this can be fulfilled. The gnarly part of the work is mapping structured and unstructured data flows, figuring out the ideal workflow, getting the agent the context it needs to do the work properly, figuring out where the human interfaces with the agent and at what steps, manages evals and reviews after any major model or data change, and runs and manages the agents on an ongoing basis tracking KPIs, and so on.
The person must be good at mapping the process and understanding where the value could be unlocked and be relatively technical, and has full autonomy to connect up business systems and drive automation. This means they’re comfortable with skills, MCP, CLIs, and so on, and the company believes it’s safe for them to do so. But also great operationally and at business.
It may be an existing person repositioned, or a totally net new person in the company. There will likely need to be one or more of these people on every team, so it’s not a centralized role per se. It may rile up into IT or an AI team, or live in the function and just have checkpoints with a central function.
This would also be a fantastic job for next gen hires who are leaning into AI, and are technical, to be able to go into. And for anyone concerned about engineers in the future, this will be an obvious area for these skills as well.
Stop telling kids to “make eye contact” or “stand up straight.”
Vanessa Van Edwards has three much smarter body language hacks that actually work:
1. Ask them to notice the other person’s eye color — it gives a real reason to look up and connect.
2. Hands first — always approach with your hand out so you clearly signal how you want to be greeted (handshake, high-five, fist bump, or wave).
3. Superhero cape — roll your shoulders back and maximize the space between your ear and shoulder. It instantly makes kids look more confident and credible.
She uses these herself on Zoom, in photos, and in real life.
Small changes. Big difference.
What’s one tiny body language trick you wish someone had taught you earlier?
The ultimate rate limiter on productivity gains from agents will be on critical stuff like security, compliance, governance, the ability to review the work of the agent, ensure that it’s compatible with regulations, and so on.
We’ve been living in a little bit of la-la land around how much software enterprises are going to ultimately want to vibe code themselves. The last 48 hours represents a good example of why you won’t take on every risk of every piece of technology in your enterprise.
There’s no free lunch with AI productivity. Companies will have the build up the systems, processes, and controls for ensuring that agents can’t run around and do anything they want on any data at any time.
New Article, possibly my last for a while.
I've spent two years figuring out how to make a two-person law firm compete with teams twenty times its size using AI. This is the closest I'll come to explaining how.
Also explains why I can type “plz fix” and get back work product that reads like I spent three hours on it, when really I spent three hundred hours building the system that did.
Evals are the new PRD.
The companies building AI products that actually work are running 12.8 eval experiments per day. Here is the playbook with @ankrgyl, Founder and CEO of @braintrust ($800M valuation, behind Vercel, Replit, Ramp, Zapier, Notion, Airtable):
⏱ 1:43 Why vibe checks stop scaling
⏱ 6:35 Evals are the new PRD
⏱ 8:45 The Claude Code evals controversy
⏱ 18:48 Building an eval live from zero
⏱ 29:51 Connecting Linear MCP and iterating
⏱ 39:12 Why you need evals that fail
⏱ 43:36 Offline vs online evals
⏱ 47:40 Three mistakes killing eval culture
The core framework: every eval is exactly three things. A set of inputs your product needs to handle. A task that takes those inputs and generates outputs. A scoring function that produces a number between 0 and 1.
We built one from scratch on camera. Score went from 0 to 0.75 in under 20 minutes.
The PM playbook was built on an assumption that the technology underneath your product is roughly stable
With the current pace of model progress, this is no longer true. Here's how we've evolved the PM role:
We invited Claude users to share how they use AI, what they dream it could make possible, and what they fear it might do.
Nearly 81,000 people responded in one week—the largest qualitative study of its kind.
Read more: https://t.co/tmp2RnZxRm
> be niantic
> launch Pokemon go
> 500M players scan real-world places while playing
> scans turn into 30B geo-tagged images
> niantic builds a 3D map of the world
> robots and AR apps use it to navigate within centimeters without GPS
this is actually insane.