Welcoming @solidwillity to SCALE! 🚀
His startup, Touchdown Labs, makes AI inference optimization simple and accessible for resource-constrained organizations.
Can't wait to see what you build!
#SCALEatGMI#GMICloud#Startup
I got to spend all day today with Jensen in Taiwan: talking with thousands of engineers and eating street food at a night market. Jensen is received as a rockstar in Taiwan, like it's Beatles in the 60's. It's mind-blowing and fun to watch. But most importantly, through all the interactions and all my conversations with him, he remained the same humble, kind, thoughtful, funny guy he always was, even as a kid who went to these same night markets many years ago.
Btw, we tried a crazy amount of different street food. It's legit some of the most delicious food I've ever had. I can't wait to share video of it, including a ton of our conversations and hangout. When I can pause for a moment from all the travel to edit the video, I'll post it.
Can't wait to continue talking to Jensen and engineers at Computex this week, and exploring more of Taiwan, and of course roaming the night markets for some more delicious street food.
Days like these, even more than usual, I feel like the luckiest kid in the world.
Love you all! ❤️
Today we're announcing that hybrid agentic inference is coming to Perplexity Computer.
Computer can split tasks between a local model running on your machine and frontier models in the cloud. This keeps private data on your device and maximizes token efficiency.
Coming soon.
We have worked with @nvidia to integrate their official Agent Skills catalog into the Hermes Skills Hub.
These skills teach your agent how to use CUDA-X libraries, Omniverse and Physical AI workflows, NeMo training and inference tools, and other platform components.
🦔Fortune published a piece this afternoon connecting Microsoft and Uber's AI cost overruns to token economics, with a headline that lands hard: "Microsoft reports are exposing AI's real cost problem: Using the tech is more expensive than paying human employees." Underneath those headlines, the unit economics tell the story. OpenAI is projected to lose $14 billion in 2026, spending roughly $2 for every dollar of revenue it brings in. Anthropic is in a similar position with break-even not projected until 2028. GPU rental prices for Nvidia's newest Blackwell chips jumped 48% in just two months. OpenAI's response was to close a $122 billion private funding round at an $852 billion valuation, the largest in history.
My Take
The token pricing story is really an IPO timing story. OpenAI, Anthropic, and xAI all need to go public in the next 18 to 24 months because the private market cannot keep absorbing burn rates like these indefinitely. Public markets do not accept "we will figure it out" as a line item on an S-1, they require disclosed unit economics with a credible path to profitability and a date attached. That deadline is why the price increases are happening now rather than next year. The labs need to show declining loss curves before the filings hit, and that means enterprise customers have to start covering more of the actual cost regardless of whether the productivity math holds on their end.
Every token bought over the last two years was effectively subsidized below cost by venture capital and hyperscaler cross-subsidies, and that subsidy has a hard deadline. Uber publicly admitted burning through its entire 2026 AI budget in four months, and CFOs at major enterprises are starting to flag the same pressure. The labs cannot keep losing $2 per dollar of revenue once they file public statements, so the cost transfer to customers accelerates from here. For investors, the question is not whether these companies are valuable. They clearly are. The question is who absorbs the difference between what enterprises can budget and what the models actually consume between now and 2028, and right now the answer is the hyperscalers funding the buildout. That is why I have been watching Microsoft and Amazon capex commentary more closely than the lab announcements themselves.
Hedgie🤗
Link: https://t.co/S2oIgUSijV
💥Today we release InferenceBench, our next benchmark after PostTrainBench that measures progress on AI R&D automation.
AI R&D automation will very likely unfold gradually, starting from “boring” tasks like inference speed optimization that are very easily verifiable (accuracy + inference time). We show a rather negative result for current frontier agents. They are not good at system-level engineering and managing complex dependencies. They do show non-trivial performance, but they fail compared to a simple baseline: hyperparameter tuning of vLLM/SGLang hyperparameters.
Importantly, InferenceBench tests *open-ended* inference optimization capabilities. This is different from more narrow benchmarks like KernelBench that only let agents optimize kernels (which is a very valuable task, too!). The benchmark is intentionally open-ended, so the poor performance of the agents is not an underelicitation issue. The agents have everything needed to succeed, but they still fail because they are not yet reliable enough for this task.
Our results suggest an inverse scaling phenomenon: Claude Sonnet 4.6 and GLM-5 rank highly because they more often preserve simple, valid, high-performing final servers, while several larger models show stronger peak runs but lose utility through brittle final-state choices. This contrasts with benchmarks where rankings track raw capability (e.g., SWE-Bench, Terminal-Bench, PostTrainBench, FrontierSWE).
One of the primary bottlenecks we have clearly observed is the lack of diversity of strategies: nearly all agents just use vLLM, without exploring alternatives. Overall, proper exploration is lacking: the current agents are not ready to tackle broad enough goals and get stuck after the first found solution (such as vLLM). I’m sure future agents will do much better, but here is where we are now.
This benchmark is our 2nd one in a suite of benchmarks that will track the progress on AI R&D automation. We will develop many more benchmarks that will cover different aspects of AI R&D automation, culminating in recursive self-improvement. Stay tuned!