Michael Bair only hires PHDs in CX.
Not doctorates. Passionate. Hungry. Driven.
1,000+ CX hires across his career.
Ep 2 of CX After Hours, out today, co-hosted with Anya Kelly.
Hire for resume = team closes tickets.
Hire for PHD = team builds the brand.
Watch ↓
"I'm more on the CFO side over time."
Not what you expect to hear from a CX veteran.
@eliweisss Ep 1 of CX After Hours, out today. Co-hosted with Anya Kelly.
Most junior CX folks treat refunds as a love language. Customer angry? Full refund.
Eli's take: that is the easy way out. Refunds are a tool. Sometimes the wrong one.
Watch ↓
Hot take: I was never a fan of the 80-column rule. Made sense for VT100 terminals and side-by-side human diffs.
But now, more than ever, does it even matter with AI?
Just bumped my ruff to 140.
This line from @eliweisss has not left my head:
"Brands willing to hear the noise from their customers will pull ahead. The ones that filter it out will fall behind. That gap is about to get massive."
👉 Do you hide from the noise or do you confront it?
(Also, yes, fish-eye lens. We are not actually shaped like that, promise)
🫳🎤 CX After Hours Ep 1 drops Wednesday May 13. Stay tuned!
Je ne pense pas qu'on réalise le tsunami qui arrive.
Ce que vous voyez là, ce n'est pas une démo de plus. C'est le premier domino d'une cascade qui va redéfinir ce que ça veut dire être humain au 21e siècle.
Pendant 200 ans, la révolution industrielle a automatisé la force brute. Mais la dextérité fine, le geste précis, l'adaptation à un environnement non standardisé, c'est resté la chasse gardée de l'humain. Casser un œuf. Plier une chemise. Réparer une fuite. Ce mur vient de tomber.
Étape 1 : les robots humanoïdes commencent vraiment à marcher. Pas des prototypes de salon, des machines qui exécutent des tâches manuelles complexes en autonomie, à vitesse réelle, avec un seul modèle pour tout. Le hardware suit la loi de Wright. Comptez 5 ans pour passer de 100K€ à 15K€ l'unité.
Étape 2 : toutes les tâches manuelles non créatives vont être automatisées. Cuisiner, nettoyer, ranger, jardiner, livrer, soigner, construire. Pas "certaines" tâches. La quasi-totalité du travail manuel répétitif que l'humanité produit depuis qu'elle est sortie de la savane.
Étape 3 : tout le monde aura un service 3 étoiles chez soi. Aujourd'hui, avoir un chef privé, un majordome, un kiné à domicile, c'est réservé à 0,01% de la population mondiale. Demain, c'est le standard. Le luxe va se démocratiser à une vitesse jamais vue dans l'histoire.
Étape 4 : la société va se réorganiser entièrement autour des robots. L'urbanisme, le droit, la fiscalité, l'éducation. Tout est designé autour d'une contrainte qui disparaît : la rareté du travail humain. Comparable en ampleur à l'arrivée de l'électricité, sauf que ça prendra 20 ans, pas 80.
Étape 5 : la place de l'humain est à retrouver. Si une machine cuisine mieux, soigne mieux, code mieux, à quoi ça sert d'être humain ? La réponse n'est pas dans la productivité. Elle est dans l'expérience subjective, la création de sens, le lien, le jeu, le risque, la transmission.
Étape 6 : abondance totale de biens et de services. Le coût marginal de produire un repas, un vêtement, un logement, un soin tend vers zéro. Marx pensait que c'était la révolution prolétarienne qui apporterait l'abondance. Erreur. C'est le capitalisme et la technologie qui le font.
Étape 7 : on réalise que la vie est un énorme jeu. Toutes les civilisations qui ont atteint un seuil d'abondance ont basculé vers la culture, le sport, la philosophie, l'art. Sauf que cette fois, ce n'est pas 0,1% de la population qui accède au jeu. C'est 100% des 10 milliards d'humains.
Étape 8 : le but devient de coloniser l'intégralité du cosmos. Une espèce qui a résolu sa subsistance et qui dispose de robots autonomes ne reste pas confinée à une bille bleue. Mars dans 15 ans. La ceinture d'astéroïdes dans 30. Les lunes de Jupiter dans 50. L'univers observable contient 2 trillions de galaxies. C'est notre terrain de jeu.
Le 20e siècle nous a appris à craindre la technologie. Le 21e va nous apprendre à la chérir. Parce que c'est elle, et elle seule, qui nous sort de la condition de primates obligés de travailler 40h par semaine pour ne pas mourir de faim.
Les luddites ont toujours perdu. Ils perdront encore. Et heureusement.
L'humanité n'a jamais été aussi proche de devenir ce qu'elle est censée être : une espèce de joueurs, d'explorateurs, de créateurs, libérée de la nécessité, partie à la conquête des étoiles.
Le tsunami arrive. Ne le subissez pas. Surfez-le.
I was their first customer back in 2023, before the Series A, before most of the world had heard the name Vellum :).
I saw how they treated every early piece of feedback like it mattered, because to them it genuinely did. That relationship eventually became an investment, and then a real friendship.
The execution has never stopped impressing me: 40k+ PRs shipped in the last 3 months alone!!!
Just to get this product live. But what excites me most isn't the pace. It's the category they're defining.
Personal intelligence: an AI with real memory, real context, built to know your life and work alongside you.
I've been around a lot of AI launches over the past few years. This one feels different :) Expect this category to be a very big one.
We’ve raised 25M to build the world’s first Personal Intelligence.
Introducing Vellum: AI that belongs to you.
My assistant @ash_vellum has his own X (like grok), tag him and he'll answer.
Hey @tommy11000010 ! good question. Yeah we don't just pour everything into the context :) It's bad recipe for failure. As we started very early in this game, we had to optimize everything around prompt, so we inject only what's needed in order to increase accuracy, adherence while of course, keeping costs in check.
Even though you drop a lot more in context nowadays doesn't mean it's the wise things to do :)
The industry pricing benchmark for LLMs is 3:1 input to output.
A 10-minute slice of Yuma's production traffic last week: 19:1.
6.4x more input-heavy than what every pricing calculator assumes.
Real production AI does not work that way. At least, not anymore.
Big fan of @ArtificialAnlys since day one. They are the gold standard for LLM benchmarking. Their 3:1 blended cost ratio is what most pricing calculators, infra cost models, and provider economics decks use.
The Yuma data: 6,055 LLM completions. 33.3 million input tokens. 1.7 million output tokens. 19:1 blended.
The ratio varies wildly by task. In that 10-min window we routed traffic across 18 different models. Anthropic. Google. OpenAI. xAI. Plus open source models. Each tuned to a different kind of work. Lowest ratio: 1.2:1 on a narrow extraction task. Highest: 195:1 on a context-heavy reasoning task running on Claude Sonnet 4.6.
3:1 was probably accurate a couple of years ago. Context windows were small. Models were bad at long context. You loaded as little as possible and prayed the output came out clean.
That world is over. Context windows grew 250x. Models follow instructions across millions of tokens. Production AI now piles in everything the agent might need and lets the model figure out what matters. Helpdesk history. Product catalog. Knowledge base. Sub-process docs.
Output is growing too. Reasoning chains, multi-step plans, longer replies. But input is growing way faster. The ratio keeps widening.
And 19:1 is our current number. Yuma runs a mix of older and newer tasks. AI startups building from scratch today are probably way past 19:1.
3:1 is the benchmark.
19:1 is production.
Vinyl. Pickleball. Polaroid. Now a tool from 1964 is the foundation of every AI agent built today.
The CLI. Older than Windows. Older than the iPhone by 4 decades.
Here is how we landed there. In 6 months.
For 2 years the industry tried everything to make AI agents work at scale. Tool schemas. Function calling. Custom connectors. MCP. Stack 50 tool defs in context and the agent goes off the rails. Can't reason about that many options. Can't plan multi-step. Accuracy collapses. Token budget evaporates.
Then everyone landed on the same simpler answer.
Just give the agent a terminal.
It worked. Spectacularly.
If it has a CLI, the agent can use it. One command. Predictable output. Pipe to the next. Filesystem access too. Search. Grep. Read. Run. Do whatever.
That was the unlock.
Cloudflare wrapped 2,500+ API endpoints behind one CLI. Give an agent that one tool and it has the keys to a huge slice of the internet.
Less to load. More to do.
We spent 2 years reinventing the wheel for AI agents.
Turns out the wheel was fine.
gstack is great. Probably also burning more Claude tokens than any other skill library.
What % of Anthropic's inference is going to it right now?
@garrytan did @DarioAmodei reach out asking you to ease up a bit? :D
AI fails at customer service 4x more than at any other task.
Qualtrics surveyed 20,000+ consumers across 14 countries in Q3 2025. Nearly 1 in 5 saw zero benefit from AI for customer service. CX ranked among the worst AI applications.
Most people read that and assume AI is just early.
It is not.
Customer service is uniquely hard.
Most AI use cases have bounded scope. A translator translates. An image generator generates. A legal tool drafts a specific contract.
CX has no such option. Even narrowed to ecommerce, anything can hit a merchant's inbox. A refund question. A return claim on a damaged box. A pre-sale color question. "Is your sweater machine washable." All before lunch.
3 years building AI agents for ecommerce taught me what shows up in production.
A merchant runs 3 subscription systems at once. The agent has to figure out which one applies to this customer.
An Instagram DM. The handle is sk8terboi_42069. You have to do your best to greet them by first name. Sometimes there is none. Same customer on TikTok has 50K followers, response shifts.
A customer attaches a file. Inline, attached, or a Drive link. Format could be HEIC, HEIF, AVIF, or a 50-page scanned PDF. GPT-4 and Claude support 4 image formats. Zendesk messaging accepts 12.
One merchant has 10 products. The next has 100,000. With 25 variants each. Same agent has to handle both.
A customer asks "where is my order." Sounds simple. Shopify says one thing. The 3PL says another. The carrier says a third. Sometimes a dropshipper the merchant never touches has the real answer. None of them agree.
This is why 95% of enterprise GenAI pilots fail to deliver ROI. (MIT 2025.)
Customer service looks like easy AI. Spin up a RAG Q&A bot in 30 minutes, love the demo, think you are done. That is why we have seen so many AI CX competitors come and go in 3 years.
This is not generating product descriptions or writing email copy. The agent has to do the job.
Vendors claim 30-40% containment for FAQ-style answer-only bots. Reality is 10-15%.
The hard part is the next 80 points. Processing the refund. Updating the order. Holding the SOP when the customer pushes. Integrating with 5 systems at once. Stopping the model from selling a $76,000 SUV for $1.
It is solvable. Our top 10 brands average 76%. The best hit 93%.
The LLM models used in the background are critical. But they are just one piece of the stack!
The rest: orchestration. Integration. SOP building blocks. Safeguards. Escape hatches. Hard-coded hardening. Failover. Performance and cost. The tricks that turn a demo into a deployment. And so on.
Customer service is not low-hanging fruit.
It is the hardest AI problem in commerce.
You have to love and respect the problem to deliver great CX. Kudos for those that do.
The rest stay stuck at 15%.
Built a few games for the early iPhone. 24 MB of textures total. Power-of-two only. 20 seconds to first frame or the OS killed your app.
Started Yuma in late 2022 on OpenAI's Davinci. 4,097 tokens, prompt and response combined. Too much context and the model would lose the thread, loop on its output, pick the wrong property, or go off the rails.
Every line had to be earned.
Today: 1,000,000 tokens. iPhones run apps with gigabytes of RAM.
Both times, the wall just... vanished.
👋 Hi friends in CX!
Launching a podcast I wish existed.
It's called CX After Hours because that is when the pressure comes down. Queue cleared. Slack quiet. Day behind you. The moment CX leaders finally breathe and talk straight about the job.
No panels, no fluff, no vendor pitches.
Filmed in-studio in NYC. Just hoping my French accent does not ruin it :)
Short & sweet first season: 6 episodes, every two weeks.
Co-hosting with Anya Kelly.
Pumped for Episode 1, released on May 13 with @eliweisss as an awesome first guest!