Nate's Substack today cited Andon Labs' Vending-Bench: max-effort Opus 4.8 scored lower than high-effort Opus 4.8 on long-horizon tasks. Both scored lower than Opus 4.7.
Uber reportedly now caps coding agents at $1,500/month per employee per tool - seems sensible to me, but it's also an interesting hint at the value Uber thinks these tools are providing
https://t.co/6YT0lCzPml
We're experimenting with a PDF companion for each episode — tables, charts, timelines, etc.
Here's our first one for Vanguard. Let us know what you think!
https://t.co/09HtFXc3iS
AI collapsed the cost of generating engineering work. It did not collapse the cost of reviewing, owning, or understanding it. The engineers winning right now aren't the ones running the most agents. They're the ones who know exactly what they shipped and why.
Got into all of it in today's issue, plus Ford telling 4,653 Bronco Sport and Maverick owners to stop driving over a steering failure: https://t.co/KCZps3MobN
Toyota just gave the save-the-manuals crowd exactly what it wanted: a two-seat, manual-only GRMN Corolla. Rear seats gone, no automatic, suspension tuned at the Nürburgring. Then it killed its Lexus EV flagship the same week.
@nateberkopec I don't know why this is surprising. I'm neither an AI maximalist nor anti-AI. I use it in many places, writing included. The allure is too great to expect most people to not use it for writing. But I hope we can keep exercising/improving editorial control
@RaulJuncoV Within-digest dedup (same item matches multiple searches, show once) is clear. The harder case: an item matched last week's digest and is still available. Do you send it again? How do you track "user has already been notified about this specific item" across sends?