turbopuffer crossed $100M run-rate in March. 19mo after $1M. Profitable & <$1M raised.
Cursor・Anthropic・Notion・Cognition・Harvey・Bridgewater・Ramp・Linear・Legora・Superhuman・Atlassian・Granola
We’d be nowhere without them. We work like hell to exceed their expectations.
Google Cloud has blocked our account, making some Railway services unavailable. We have escalated this directly with Google. The Railway Platform team has since confirmed access to Google Cloud and is working on restoring access to all workloads.
We have access to some of our Google Cloud–hosted infrastructure and are working to restore the rest of the service. We apologize for the disruption.
It's been a lot of fun revisiting this idea with the help of Claude Code. It's generated the scaffolding that would have put me off starting, and it's been helpful as a sounding board preventing me from wasting time implementing half-baked ideas.
https://t.co/VKnEYGBp9R
Come if for no other reason to see a launch unlike others (BTS attached, seriously in awe of the team). But also to see one of the most sophisticated agent platforms out there. Hits all the hottest topics (agent orchestration, data integration, agent tooling, CLI).
https://t.co/YbqE7hXysJ
It is time for the United States Postal Service to ban junk mail.
Unsolicited spam calls are already prohibited by the FCC. Emails are heavily regulated by the CAN-SPAM Act of 2003. Junk mail is the majority of mail, 100 million trees per year. Enough!
Introducing SubQ - a major breakthrough in LLM intelligence.
It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA),
And the first frontier model with a 12 million token context window which is:
- 52x faster than FlashAttention at 1MM tokens
- Less than 5% the cost of Opus
Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention).
Only a small fraction actually matter.
@subquadratic finds and focuses only on the ones that do.
That's nearly 1,000x less compute and a new way for LLMs to scale.
Xbox needs to move faster, deepen our connection with the community, and address friction for both players and developers.
Today, we promoted leaders who helped build Xbox, while also bringing in new voices to help push us forward. This balance is important as we get the business back on track.
As part of this shift, you’ll see us begin to retire features that don’t align with where we’re headed. We will begin winding down Copilot on mobile and will stop development of Copilot on console.
DeepSeek v4 works fine, but it’s not the frontier-pressing moment we saw with Kimi 2.6. On Notion eval data, it’s similar performance to GPT 5.2, with understandable failings.
Most interesting — it doesn’t scale well. It’s ridiculously slow. On multiple major, trusted, and performant US inference providers we see it 15x slower than GPT 5.2 and 2x slower than Opus 4.7, a problem Kimi never had.
Curious if it’s a fundamental issue in architecture, or a matter of time til inference providers make it work. Doesn’t seem urgent either way, if Kimi can outperform. Cheaper maybe, but not groundbreaking.
We are entering an extremely exciting era for open-weight models.
Kimi K2.6 now feels like a top agentic model.
I took it for a spin via @FireworksAI_HQ fast inference APIs.
Kimi K2.6 has impressive agentic capabilities, design skills, and the ability to synthesize large amounts of information.
I built a little Skill that produces survey papers on any AI research topic you want. (see example in the clip)
You can use the skill to tell your agent to generate a survey on whatever topic and watch it go to work.
The artifact was fully generated by @Kimi_Moonshot's Kimi K2.6. It's cheap and fast.
Next step for me is to explore ways to continue integrating the capabilities of these models on use cases like automating my LLM knowledge bases and augmenting my agent memory capabilities.
Stay tuned for more.
When I saw our team's evals of Kimi 2.6, I thought "ok, things are gonna get interesting now".
This is the first open-weight model that plays like a top-class agentic model. Watching it go through ambiguous and meticulous chained tool work successfully puts it squarely in the wheelhouse of Opus 4.6. We're looking at an open weight model, but with much cheaper direct inference provider pricing. For a subclass of our eval set, it's outperforming GPT 5.2. We're about to undergo a gigantic industry shift.
Open weight is no longer for those who fine tune, those who want on-prem. It's an actual, reliable option for it's quality/price/latency profile for difficult agentic work.
It's not perfect. It's token hungry, relatively slow, and can get stuck in “thinking loops". But those are things we can engineer around. For value it is, and how it positions itself against major labs, this is a dramatic day for open weight models.
We sprinted as a team and worked closely with @FireworksAI_HQ to get this to our customers on day 0. No one should wait to try out a change like this. Try it yourself and tell me where it's working for you.