Intelligence allocation is going to be extremely important, and each co is going to have the best evals for doing this optimally (both performance and cost) for their use cases
ModelCos are in an obviously strong position at the moment but the threat from the layers above (apps) and below (hardware) trying to commoditize them is deeply underestimated today
Token costs are becoming one of the hottest topics for any enterprise I talk with right now. It’s very bullish for AI in general because it means these systems are being used at a scale that wasn’t contemplated before.
It also gives way to another form of differentiation that will emerge for the applied AI layer, which is model routing.
As tokens take on a significant amount of the cost of any given workflow, then companies will inevitably want to ensure that their dollars go into the most efficient use of tokens for the particular job at hand.
Frontier intelligence will always be relevant at the high end of tasks, like coding, legal and financial analysis, healthcare, and more. And dollars spent here will only go up over time. But, equally, you can peel off individual tasks to lower cost models (whether they’re from open weights vendors or the major labs) and deliver a more efficient end outcome.
To do this effectively, the applied AI layer needs to understand the workflows in their domain better than anyone else, and be able to mix and match models to different jobs. If you’re doing document extraction, you need to know which models perform better or worse for any given document type. If you’re legal analysis, you want to know which models perform various types of tasks best. And so on.
This will become one of the bigger differentiation points over time. The companies with the best evals, the best ability to route the workloads, and those that have business models directly aligned to customers financial goals, will be in a great position.
“If SpaceX fails to deliver the access to Nvidia Inc. chips as part of the deal by Sept. 30, Google has the right to terminate the contract, with a one-month grace period, the filing shows.”
Google has agreed to pay SpaceX $920 million a month for computing power as part of a cloud-services deal that runs through mid-2029 https://t.co/NNcexacQwz
Google has agreed to pay SpaceX $920 million a month for computing power as part of a cloud-services deal that runs through mid-2029 https://t.co/NNcexacQwz
The framework that perfectly describes what's happening with token-maxxing is Goodhart's Law. The idea, coined in 1975 by economist Charles Goodhart, states that "when a measure becomes a target, it ceases to be a good measure."
Applied to AI, two things are true:
1) early adopters of technology typically create an edge that sustains over time! It's good to be ahead of the curve, lean in when the frontier is expanding so fast and when a lot is still illegible. In AI, this manifests as simply trying a lot of things. Especially because of how non-deterministic and emergent AI is-- there is no document listing out all the things LLMs can do. There is a massive capability overhang, and the only way you traverse the space of what's possible is by experimentation. Shots on goal, shots on goal. Cultures of high experimentation, bias towards action, and learning quickly tend to always outperform (this is the precondition to truly right tail outcomes), and the nature of AI accentuates that.
2) but once you make "use of AI" a target, it ceases to become a good measure. Tokens are one dimension. It's a measure too simple to encapsulate what really matters: squeezing the most lemonade out of the lemon that is AI so that customers can be happier, the product can be better, the possibilities can expand, and the business can unlock new levels of what "being the best versions of themselves" mean to them. It's not lemonade for lemonades sake. It's AI in service of increasing ambition along multiple dimensions. AI in service of redefining what a right tail outcome even is and increasing the odds of getting there.
No standardized metric will ever get to that because each business has a different definition of greatness. "Tokens" are the best approximation we've found so far, but simply increasing spend on a specific type of software (which is not AI) does not say anything about the ROI on that spend. Yes, the teams that buy into AI's promise today are likely to be better off than the team's who don't. But I don't think tokenmaxxing will be the right end state metric to summarize that.
I’ve joined the🦞@openclaw Foundation as Chief Architect! Excited to propel the future of agentic computing with @steipete and a world-class team.
In the post-claw era, AI is moving beyond coding into our personal lives. Big announcements at @nvidia Computex & @Microsoft Build!
@henrytdowling actually i mean that applications should be the layer that specializes in reducing the token costs associated with using frontier models
Seems basically impossible to be a public company right now.
You’re simultaneously getting the feedback that you need to be an AI leader AND that you predictable high margins but:
1. You’re spending tons of tokens, and increasingly, more than the tons you had planned on spending (“we blew our whole budget in 3 months”)
2. All the spend is highly experimental and you don’t know if the results will be there (good or bad) or how they will materialize (more growth, more profits, etc.)
The future is multi-model.
AI teams will not choose multiple models just for optionality. They will do it because inference at scale forces tradeoffs, and lock-in gets expensive.
No single model will be best across quality, latency, cost, and modality. The best AI companies will route each task to the model on the Pareto frontier.
AI natives will do this first driven by the fact that their usage is exploding and inference demand will continue to ramp. Digital natives and enterprises will follow as they scale up token consumption as well.
As labs move up the stack into applications, companies will also want more control over their product, margins, and data loops. It is somewhat awkward to depend on model providers that may eventually compete with you.
It feels like we are moving towards:
- Use the right model for the right task.
- Continuously optimize for quality, speed, and cost.
- Tightly couple inference and optimization loops.
Today I'm excited to share that Hark has raised $700M at a $6B valuation
When I use these AI models today, they feel basic. They should be able to listen and talk naturally, understand vision, retain persistent memory, and become deeply personalized over time. They should be able to see the world, interact with it, and take action
To build that future, the capital we raised today will be used to:
→ scale our GPU infrastructure
→ accelerate future AI model development
→ grow the Hark team from ~70 to 200 engineers
→ design and build the next generation of AI hardware
The Series A round was led by Parkway Venture Capital with participation from NVIDIA, Align Ventures, AMD Ventures, ARK Invest, Brookfield, Greycroft, Intel Capital, Prime Movers Lab, Qualcomm Ventures, Salesforce Ventures, and Tamarack Global
At Hark, we are building the most advanced personal intelligence in the world. Intelligence that begins to think like you and sometimes, ahead of you to offload your mental workload
🧠We introduce "Generative Recursive Reasoning"!
Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor.
Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling.
And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x).
With only 10M params:
• Sudoku-Extreme: 97.0% (TRM 87.4%)
• ARC-AGI-1: 52.0%
• ARC-AGI-2: 11.1%
• N-Queens coverage: 90%+
📄 Paper: https://t.co/JC7EyXYc9Y
🌐 Project page: https://t.co/LRT1dQiWLZ
w/
Junyeob Baek @JunyeobB (KAIST),
Mingyu Jo @pyross0000 (KAIST),
Minsu Kim @minsuuukim (KAIST & Mila),
Mengye Ren @mengyer (NYU),
Yoshua Bengio @Yoshua_Bengio (Mila),
Sungjin Ahn @SungjinAhn_ (KAIST)
Marketing holding company Publicis is acquiring LiveRamp for $2.2B in cash. Some notes:
- LiveRamp did $813M revenue (+9% y/y) and $168M operating cash flow in FY26
- Price represents a ~30% premium to where LiveRamp was trading and a 2.7x revenue multiple
- Publicis already owns Epsilon (identity data). LiveRamp adds clean rooms + data collaboration on top
- Publicis has now spent ~$10B over the last decade on data related acquisitions