Want to give a big shoutout to @baseten, they have been an amazing inference partner during our launch this week. Top notch technical skills + kind humans + deeply care about their work.
it's crazy out here in sf
there's literally @baseten ice cream at humphry slocombe, @fal padel courts, and @TrustVanta coasters at bars.
chat is this the top?
Yesterday, models could speak. Today, models can see. The "optic nerve" of AI is here via Marble. Simply epic. Massive h/t to the @theworldlabs team for the biggest breakthrough of 2026.
The companies we support are giving doctors super powers, enabling a next generation of builders and unblocking creatives from sharing their vision with the world. Personally, I'm pumped we're able to support generational ideas through inference. Holler at us!
Generational AI companies are powered by Baseten.
Why? We obsess over the milliseconds, so they can ship the future.
Focus on what actually differentiates you. Leave the inference to us.
Baseten’s day 0 bet was that inference was the technology that would enable the best user experiences AI could deliver–fast, smart, reliable, secure. And that those experiences would rely not only on a handful of giant general intelligence models, but millions of specialized models built by companies for their specific customers and use cases.
Whether you’re a doctor, developer, lawyer, mechanic, researcher, construction worker, marketer, etc, you’re accelerated by specialized tools worthy of your craft. To me, this is one of the most meaningful promises AI can deliver on.
We’re starting to see it now. Many of the main-character AI companies on the application layer are built on highly-specialized models for highly-specialized workflows–Abridge, Clay, Cursor, OpenEvidence, Hebbia, Mercor, Notion–these businesses are booming because customers love specialized tools.
There are probably hundreds of custom models in production today. Soon, there will be thousands and then millions. All enabled by a high-performing inference layer.
Inference has emerged as one of the hardest problems in modern AI systems. Delivering reliable, low-latency experiences requires deep coordination across distributed infrastructure, kernel-level performance, and software ergonomics—even world-class teams struggle to do this well. As a result, as consumers and developers, we’ve grown to accept sluggish performance, frequent downtime, and inconsistent quality across both application companies and model providers.
Meanwhile, the demands on inference are accelerating: AI adoption is trending towards ubiquity with reasoning models that are orders of magnitude more compute-intensive. This will only increase as more companies catch on to the virtues of owning their end-to-end IP rather than relying on black-box model APIs on shared infrastructure. Whether we can realize the impact of this generational shift will depend on our ability to serve these models reliably at scale.
We knew we could make the technology work, but the biggest delight of it all has been seeing what our customers do with it. The (many-model) future is bright.