Last week, a Head of AI Transformation at a large enterprise told me:
"We're holding off on buying observability/performance eval tooling. Models are improving so fast. Eventually they'll catch their own errors."
That mindset is incredibly dangerous.
Here's why 👇
Computer use agents is a temporary solution. Guardrails for blocking malicious bots will always need to exist (and even more important with AI). The goal for computer-use should be making it easy for websites to authenticate agents and give them scoped access.
@triathenum@jdegoes@lifeof_jer It's entirely on the vibers to understand the ceiling and risks here, disclaimers and cautionary tales are everywhere if they aren't too lazy to check and invest in learning the basics
I taught Claude to talk like a caveman to use 75% less tokens.
normal claude: ~180 tokens for a web search task
caveman claude: ~45 tokens for the same task
"I executed the web search tool" = 8 tokens
caveman version: "Tool work" = 2 tokens
every single grunt swap saves 6-10 tokens. across a FULL task that's 50-100 tokens saved
why does it work? caveman claude doesn't explain itself. it does its task first. gives the result. then stops.
no "I'd be happy to help you with that." no "Let me search the web for you" no more unnecessary filler words
"result. done. me stop."
50-75% burn reduction
with usage limits getting tighter every week this might be the most practical hack out there right now
I've been using Obsidian for a week. Lesson for beginners: if you're just starting with Obsidian, build first, organize later
There's no right or wrong structure, and you'll find your favorite one by spending time with it yourself
Wow, this tweet went very viral!
I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs.
So here's the idea in a gist format: https://t.co/NlAfEJjtJV
You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.
We built the world's best deepfake detection model, per @huggingface. Then priced it at $0.25/hour.
The competition charges up to $150/hour for worse accuracy.
Turns out "enterprise security" was just a pricing strategy.
We blew it up. 🧵↓
Tighter scope -> lower variance -> better calibration -> better performance on these specialized tasks
Specialized agents are also inevitable economically, because cheaper compute + easier to finetune/debug
The orchestration layer does not need to be an LLM to work extremely well
I really like the idea of having multiple specialized agents instead of a "general purpose" agent that tries to do it all.
A few days ago, I read (sorry, I don't remember where) a study claiming that specialized agents, even when they are all using the same model, beat general agents by a mile.
These guys are doing precisely that with an army of hyper-specific agents.
And of course, they are following the ---Claw theme.
the single internal AI strategy companies should have is
let people build
instead of "unifying AI access/infrastructure"
regulating too early kills innovation and revenue opportunities
5. Causal effectiveness layer
Observability measures behavior. We need to also measure outcome.
A/B tests run on single lines of marketing copy detect meaningful impact all the time. That's why the experimentation segment is worth $3B.
So what about agents?
Last week, a Head of AI Transformation at a large enterprise told me:
"We're holding off on buying observability/performance eval tooling. Models are improving so fast. Eventually they'll catch their own errors."
That mindset is incredibly dangerous.
Here's why 👇
4. Runtime lawfulness
Evaluation ≠ enforcement.
Behind a 100% eval score on benchmarks, it's still just probabilities.
We need circuit breakers, policy engines, and continuous runtime governance and steering.
AI will only accelerate whatever condition you’re already in. That’s why grit and discipline have never been more important in history. The pretty good will become extremely good, the mediocre won’t even realize they’re still mediocre.
as a first-time manager, i’m learning that reducing scope to focus on 2-3 max value initiatives >> being in every room
let your only testimony be one that people can’t forget