In their day meta-search engines were a but. In fact, service routers generally so. But in model routers, there is real money to be saved (at least for the near future)
Model routers are becoming a favored way for companies to cut AI costs by sending simpler tasks to cheaper models.
The rise of routing could pressure frontier model providers as customers get more disciplined about token spending.
Read more: https://t.co/EbSuYfgkcm
Anthropic Mythos’ advancements that were “on a totally different level” gave DeepSeek’s CEO an “epiphany” he needed to raise $7.4B.
"If DeepSeek were to stay in the game, I mean to remain competitive in the long run, he really needed to build a massive war chest, at least in the realms of like tens of billions of dollars to begin with.” — @jingyanghk, Asia Bureau Chief
BREAKING: FAA officially announced the rulemaking to legalize supersonic flight, including the Boomless Cruise ("Mach cutoff") approach we demonstrated on XB-1.
This is a major step toward the supersonic renaissance.
CATL trying to 'Nvidia Brand' itself is a sign of both strength (current NMC battery monopoly with strong brand identity) and weakness (Blade 2 from BYD could inspire Geely and others)
Robin Zeng, exacting and detail obsessed, keeps a stranglehold over a market that touches everything from AI data centers to electric cars.
Even if Silicon Valley wanted to, it couldn’t live without him.
Full story: https://t.co/6tF3rJuk6h
At @coinbase our AI spend is down nearly half this quarter while token usage keeps climbing. My team built the infrastructure behind it: routing, caching, cheaper defaults, and the spend services that track it.
We route everything through our own gateway: a single endpoint and format for dozens of models, with cross-provider failover, redaction, logging, and cost controls all applied before anything reaches a vendor.
We started with cheaper defaults and caching. 91% of employees weren't hitting their usage caps. Instead of lowering caps, we set cheaper model defaults to cut spend. Caching took more work to get consistent across every tool and model family. A cache hit needs the prefix to match exactly, so we keep building a long, stable prefix across turns. Each request only pays full rate on the new tokens and reads the rest from cache.
Our routing accounts for caching too. The naive approach scores each turn on its own and sends it to whichever model fits, which seems reasonable but would run up spend. The cache is per-model, so switching mid-conversation invalidates it. Our router weighs cache state alongside how hard the task is: a conversation keeps its model while the cache is warm, and the chance to re-route comes only when it goes quiet long enough for the TTL to lapse. Once it does, the router is free again to pick the best model for the task.
These improvements happened at the gateway, so they apply across every team and tool. Next we're going deeper on the coding harness, where we have the most signal and flexibility, tuning how subagents and context get managed.
Number of generational companies has not increased from the 80s, at the proportion of the population of VC backed companies (most existing on backs of talented immigrants). So there is a viable argument that the inefficiency needs to be optimized beyond 'invisible hands'
Nearly all those who say the US should only admit the most talented immigrants would not themselves clear the bar they're proposing. They're effectively saying "Immigration is ok so long as you keep out people like me."
Expect the pricing of output tokens to be binary. Those that are exact equivalents of digital humans will be priced like human labor. Others will asymptote to the price of bits. Cost of human level output tokens may be achieved by OSS models - but only if funded by sovereign
A situation where both Elon and Masa are correct. One forgets that Masa comes later to the party, and bets bigger (bit like Druckenmiller). Elon has vertically integrated knowledge. Masa - an 'interpolation manifold' that others dont got
Even SoftBank’s Masayoshi Son—no shrinking violet when it comes to wild ideas—has reservations about data centers in outer space, @timkhiggins writes https://t.co/ajD1Bpb4JP
If a chatbot hallucinates, you get a bad answer. If a physical robot hallucinates, it breaks itself or hurts someone nearby.
@rocketalignment explains: "Robotics doesn't have access to the same kind of data that language models use, right? There's no internet worth of training data, just waiting there for robotics to take and to train on."
"As a robot if you hallucinate something that isn't there. You trip, you fall over, you break, your parts break, maybe you hurt someone nearby. And the stakes are just much higher."
The government is allowing trusted partners to access the Mythos 5 model following a two-week restriction that rattled the tech industry. https://t.co/jkNL2Ui2qU
Add 1% to the capital costs of Data Centers. Dole it out as an annuity to the local community over 15 years to keep politics under control. Jobs are temporary, this has duration.
🧵 The AI data center boom has a new bottleneck: local opposition.
The Information identified 300+ temporary and permanent bans on new data center development passed by state and local governments across the U.S. since 2023. More than 75 additional measures are under consideration.
https://t.co/ictHGSc1J4
@dwarkesh_sp My conspiracy theory (given that computer use is the most trivial 'world model') is that it is a business reason, not technical. It is a net negative for the entire internet industry based on eyeballs. It'll magically become possible once agentic commerce gets going
JD Vance: "I think Nixon's historical legacy is enjoying a bit of a renaissance, and deservedly so. I joked that if Watergate happened tomorrow, it would be like a 12 hours news story. The idea that it took down a presidency is crazy."
This is indeed an interesting way to use Claude Code and Slack to creating dynamic (human-human-agent) teams. Might give $CRM something to hang its hat on ..
This is a new paradigm for interacting with Claude that is significantly more "inline" with all the other human activity org-wide. Once you do all of the under the hood engineering work to make this "just work" (e.g. across tools, integrations, compute environments, memory, security, etc.), Claude basically joins the team in a seamless way - you can talk to it as you would talk to a person and it can help with a very large variety of workloads.
Imo this is the 3rd major redesign of LLM UIUX. The first paradigm was that the LLM is a website you go to, the second was that it is an app you download to your computer. This third one is that it is a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans. It really takes a while to wrap your head around it, but it works and it is awesome.
This is what’s causing Anthropic to aggressively beg for govt protection (see below). Customers are finding cheaper alternatives. Keeping employees requires continuing ultra-rich secondaries ($$$) that are dependent on revenue growth. When you can’t win on the field go to DC.
one of the boldest predictions (Reggie's Knicks victory prediction) in sports history goes unheralded. one thing to predict one game (superbowl). another the contrarian take on an entire series - https://t.co/d2luYcXhz7 #NBA2026