Very grateful to have been named in the Forbes 30 Under 30 Europe AI list!
It was a surprise seeing this come in right as I was stepping back from Dex, but I'll admit I was very happy to see my achievements marked - building a great team and the best voice AI experience in recruitment, and raising $8.4m in funding.
Really though, it was a great reminder of just how early in my career I am. So stay tuned, the best is yet to come.
And yes, I promise this is the only time I'll post about it!
As someone who has been on Twitter for 15+ years, it is clear to me that the site is the best it’s ever been.
The early Elon days were quite rough but I’m very glad it’s bounced back and we aren’t all on Bluesky.
Pulled the trigger today and switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models.
Saves us millions of $ and we're actually seeing an *increase* in performance on many core use cases. Transformative for the business.
Lots of excitement around Factory's model router and its 20% cost saving, and rightly so.
Many are saying this is just the start. After digging into the numbers, I think it's close to the ceiling. Here's why 👇
Routing only saves money on work a cheaper model can handle without dropping the ball. Factory's router holds ~99% of Opus 4.7's pass rate while cutting cost by 20%. They also published a Pareto curve of other experiments, showing that when they pushed harder performance suffered. Getting down to ~56% of Opus 4.7 cost dragged the pass rate to 81%. In their research, 20% was the elbow of the curve, i.e. about the most you can save before quality starts to go.
It's also important not to confuse this cost saving as "only 20% of tasks could be handed to smaller models". The reality was likely far more. Firstly, smaller models aren't free - Claude Sonnet is only ~50% cheaper. But most importantly, the hardest tasks are often the long, token-hungry, multi-step ones. So a handful of hard sessions still eat the lion's share of the bill, even if the majority of tasks get routed.
Furthermore, any benchmark that Opus 4.7 scores 99% is forgiving. The tasks we throw at AI in reality are often harder. If you're pushing AI to its limits you'll naturally need to send a higher share of tasks to the smartest models. Hence why I think Factory's numbers form something of a ceiling for cost saving. At least for now...
So why does this matter?
Plenty of startups are now running in-house agents and watching usage spiral. The results are awe-inspiring, but cost is a creeping concern. I've seen teams attempt their own model routing, and if you are Factory's research should give pause for thought. What it shows is that you're unlikely to beat ~20% cost reduction. Worse, if you think you have, you've probably traded away performance without realising it.
For Factory's enterprise clients, 20% off a vast bill is real money. For a startup building it yourself, if you ask me the juice isn't worth the squeeze.
My advice: worry about cost far less than you're tempted to. Put that energy into the product and anything that helps you ship faster. The AI landscape is going to keep shifting and many cost reductions are going to come for free.
Where I'd bet the genuinely dramatic cost reductions will come from (most to least likely):
→ Hardware acceleration letting the frontier labs cut prices
→ Better open-source models and a shift toward local compute
→ Specialised models giving routers cheaper options with best-in-class performance (see Harvey's announcement from the last 24 hours!)
I'm watching the first two very closely this year, and will keep sharing what I find
Lot of ground covered today! 📍
Regent’s Park - Bike ride (failed)
Southwark - Trip to bike shop (unplanned)
Home - Pitch call
South Ken - Meeting
Marylebone - Coffee
Fitzrovia - Coffee, then work + 2 more calls from a friendly VC office
Now off to Poland 🇵🇱
Back next week 🫡
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier.
First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
- It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
- It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks.
- And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
- Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq
I think Claude has had competing products memory-holed from it. Go and directly ask it to compare the difference between OpenClaw and Hermes and watch as it has no idea what they are until it uses the web search tool.
@j4ppleby Better yet, why don’t we just go through all possible questions anyone could ask and hard-code all the answers? Who needs AI? Pretty sure the worlds largest switch statement would use less water too /s
@garrytan@contextconor as you are the only person who saw this I want you to know this has lived rent free in my head since you posted it last year. Accurate af
Congratulations to everyone who applied this batch, and best of luck if you're still interviewing 🤞 follow for more, and do reach out if you're applying or raising your first round!
I met with 50+ founders and reviewed dozens more decks for this cohort of a16z @speedrun. Here are the 3 pitching mistakes I saw founders make most often 🧵
3. Failing to connect traction to the story
Storytelling is one of the most important skills any founder can have. Whatever you're building, the ability to convince people it's not just a smart idea but an important one really matters. Sales, fundraising, hiring - if you can make people care early, everything else gets easier.
Yet I've seen founders with otherwise excellent pitches let their traction slide sit as cold pipeline or revenue numbers.Don't forget the WHY. Why did your first pilots convert? Why are your users coming back week on week? Why is your pipeline full? Add customer quotes if you can, or metrics that show product success beyond revenue.
This lets you pitch beyond the numbers - showing not only that you have early traction, but that your users are happy and there's more coming.