JEFF BEZOS JUST EMERGED FROM STEALTH WITH A $41 BILLION AI STARTUP CALLED PROMETHEUS
$12 billion raised. Valued at $41 billion. Coming out of stealth today.
The backers: Bezos personally, JPMorgan, BlackRock, Goldman Sachs, DST Global, and Arch Venture Partners.
The mission: do for engineering and manufacturing what large language models did for text.
Bezos is calling it an "artificial general engineer." Instead of training on words from the internet, Prometheus ingests data from the physical world to accelerate the manufacturing of skyscrapers, smartphones, jet engines, and everything in between.
In Bezos' own words: "Something that today was going to take 100 engineers 10 years to build, if you can change that to taking 10 engineers one year to build, you're just going to get way more things built."
This is Bezos' first CEO role since stepping down from Amazon in 2021. He's co-leading it with Vik Bajaj, former Google X executive.
(Source Semafor)
Citadel Securities just put institutional weight behind what the AI bulls won't say out loud.
In a new macro note titled "Tokenomics," Citadel makes the argument plainly: even the most powerful technology on earth still has to pass through the boring discipline of cost curves, capacity limits, and marginal returns.
The evidence is piling up:
– Amazon removed its token usage leaderboard
– Microsoft cancelled Claude Code subscriptions
– Multiple companies reporting unexpectedly massive token bills
Their conclusion is the part that matters.
Adoption is no longer about what AI can do in principle. It's becoming about the price and scarcity of the inputs needed to run it at scale. Compute. Power. Cooling. Memory bandwidth. Inference budgets. All real, all binding constraints.
And here's the kicker from the chart.
The Silicon Data LLM Token Expenditure Index, a benchmark for how much the market is actually spending on AI tokens, has started rolling over. Citadel reads it as a shift toward cheaper models. Companies substituting away from expensive frontier AI toward "good enough" alternatives.
That's economics 101 doing what it always does. When the price of something rises, people use less of it, or find a cheaper version.
Citadel sees a bifurcation forming. Frontier AI concentrated among a few firms with the balance sheets to absorb the cost. Everyone else quietly downgrading to simpler, cheaper models.
This is the part of every technology revolution the early narrative ignores.
The technology being real was never the question.
The question was always whether the economics could carry the valuations.
When one of the most sophisticated trading firms on earth starts writing about AI in the language of cost curves and rationing instead of limitless demand, the conversation has quietly changed.
The hype was about what AI could do.
The reckoning is about what it costs.
NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.
“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community
also the fact that this is un purpose not visible to the user is crazy
1. Echoes what Anthropic and OAI researchers have been saying informally, which is that our benches ought to be three dimensional optimization problems between performance, cost, and latency for highly specific applications
2. Further evidence that the "good enough" level of intelligence units have been reached for 99% of white collar labor tasks and small model implementations are ever valuable; bull self serve post training infra
3. Benchmarks are ever outdated, because scaffolds/harnessing largely fails to account for test-time standardization, amidst the litany of other things that aren't standardized that make top line metrics near unreportable
new frontier eval from the cognition team. interesting that simple test time scaling is pretty noisy here instead of a clean line
lots of care in crafting a good scoring process
https://t.co/hRqAocB7z6
Good take
My guess is
- demand for intelligence is near infinite
- but 80% of workloads will be running on 99% cheaper models within 12-18 months
- 20% of workloads will still run on latest gen models where IQ maxing is important (scientific breakthroughs, higher level ochestrator agents?)
- rough analogy might be what % of macbooks or gaming PCs sold have the maxed out specs for CPU/GPU, prices are falling much faster than Moore's law here though
- this leads me to think the limiting factor will be energy and compute, not better models
At Coinbase we're working hard on routing prompts to cheaper models where appropriate, and in some cases have been able to keep costs roughly flat, while token usage continues to grow exponentially.
This is exactly right.
People are starting to look for cheaper model alternatives and realizing two things at once: open-source models are already very good, and the ability to train and serve them efficiently at scale can change the economics pretty meaningfully.
Tokens are still being subsidized, demand is ramping quickly, and the compute crunch is likely to persist. That will push companies toward using the right model for each task instead of defaulting to the most expensive one.
We’re still early, but I expect open-weight adoption to accelerate much faster than most people think.
.@tylercowen on why AI creates more jobs than it destroys:
"One of the neatest properties of current AI models is they allow a small number of individuals working with AI to really do a lot more work than was possible previously."
"This will mean more companies, more projects, more nonprofits, just more ventures."
"One area is generally energy, electricity, the grid... It's completely screwed up. It will take twenty years, thirty years, forty years to fix... The AIs cannot do that on their own."
"The biomedical sector and medical trials, there will be many, many, many more ideas to test. AIs will help with the testing, but I don't think pure testing by simulation will be possible anytime soon."
"Simply care for the elderly. There will be robots, personal companions. We have this already. But the elderly also will want human care. It wouldn't surprise me if in the future, fifteen, twenty percent of all jobs were elderly care."
"Luis Garicano had an excellent online essay. He referred to what he called 'messy jobs': jobs where it's hard to explain exactly what the job is, but on a given day you're doing eleven different things, and it requires coordination and figuring out what you ought to do next and getting other people to help you... There's a real future in messy jobs."
Tyler Cowen with @dataWyatt
This chart is more important
Token usage (blue bars) is exploding higher. It started in January when Agentic AI went mainstream with Claude Cowork and Moltbook (OpenClaw).
AI users are creating agents and code, leading to exponential growth in AI usage.
It's just starting.
A new and possibly controversial perspective:
In this video, I explain the sense in which generative AI trained by supervised learning is incapable of making novel discoveries.
https://t.co/zin5QbbT9N
The text of the speech:
AI Creativity and Discovery
Good day ladies and gentlemen. I regret that I am unable to be with you all today to engage in a back-and-forth discussion, but I am nevertheless pleased to be able to share with you, via this recording, some high-level thoughts about the current and future state of artificial intelligence, and in particular about AI’s relationship to science and mathematics, which is, as I understand it, the central focus of this meeting and of the SAIR Foundation.
I would like to start with an old joke; I am sure you have heard it before. It is the one about the researcher whose work is being evaluated, and the review comes back, and says “This work is both novel and good. Unfortunately, the parts that are good are not novel, and the parts that are novel are not good.”
My first point about AI is that this assessment applies exactly to large parts of AI as we know it today. Not all of today’s AI, but a large part of it. Pretty much all of what we mean by “Generative AI”---which includes large language models, and the images and video models, and even the new methods for learning world models. All of these AIs take large numbers of examples and produce a “model” which behaves similar to the examples, that is, which generates text like people, or images like artists or nature, and videos like we find on the internet. Don’t get me wrong, Generative AI can be extremely useful. No doubt about that. But the assessment of the joke still applies. These systems can produce output that is both novel and good, but not at the same time.
In many ways this is just absolutely not a problem. When we ask an AI for an answer from the internet, or to summarize a document, we don’t want it to be novel. We are happy if the quality of the answer, the goodness, comes from the source material—from the people who wrote the document or the articles on the internet. If the AI’s answer is novel it means it is going beyond the source material, adding something beyond it. This is what we call “hallucinations”. In most cases, we don’t like it when the AI makes something up, when it adds something novel.
One exception, of course, is when we are looking not for facts or reality, but for fiction and entertainment. We might ask for a bedtime story for a child, or an image based on existing images on the internet but which is nevertheless different and distinct from them. In these cases, it is never easy for us to know how creative the AI is actually being, as we do not know how close the AI’s story, poem, or image is to the source material. In a real practical sense we can not know this because the internet is too big, the possible sources that the AI may draw upon are too numerous.
When we ask for a fiction or novelty, the AI can give it to us because its processing is in part stochastic. Every decision can go multiple ways and will go different ways and produce a different trajectory every time. The trajectory can be random—and thus novel—or it can be based on the training data—and thus “good” because the training data is good, sourced from people or reality. Thus, the trajectory is either novel or good—based on randomness or based on data—but never both at the same time.
Really, I think it is okay if the output of Generative AI is never good and novel at the same time. For the researcher in the joke this is a devastating criticism, but for most things it is not, and for Generative AI it is not. Generative AI is meant to be a mimic. This is what supervised learning is for. Generative AI can be extremely useful, even when it just mimics, if it is faster, or cheaper, or smaller, or more customizable, or more copy-able, than the thing being mimicked. It is okay if Generative AI cannot be both novel and good at the same time. It is still a transformative technology.
But it is a limitation. And remember we are here to use AI for science and mathematics, and for these areas the assessment of the reviewer in the joke is devastating. For these areas we need true creativity and discovery. Generative AI—or Mimicking AI—will never get where us there. For these we need something more, and indeed we have something more in other parts of AI. We have many AI systems which can give us more. We have AlphaGo with its world-changing move 37, or AlphaZero with its brilliant original chess-playing style. We have GT-Sophy that drives simulated racecars better than any human. We have AlphaFold and AlphaProof and Claude-Code, which have brought true advances in science, mathematics, and programming. We have RL-Lyft which optimizes the assignment of cars to passengers in the ride-hailing business. All these systems have found things that are both novel and good. And, truth be told, some language models have been augmented in ways that make them more than Generative AI based on supervised learning.
All these systems have some additional features that make them capable of true creativity and true discovery. It is important for us to recognize what this is—and that it is not present in ordinary, garden-variety Generative AI. It is something that can not come from just supervised learning, from learning from examples. What is it? Well, it is a simple thing, a commonsense thing. It is not new. We have many names for it, but unfortunately none of them are very good names. I will call it Discovery. Basically, Discovery is just the idea of trying many things and seeing which of them work, then keeping those that worked the best. Evolution by natural selection works this way. The scientific method works this way. And just ordinary life and learning works this way. We try things and remember what works. What could be more obvious? In this behavioral case, psychology has two names for it— “instrumental learning” and “operant conditioning”—and in machine learning it is what we mean by “reinforcement learning”. We also see the idea of Discovery in planning and combinatorial search—anything that involves the idea of “generate and test”.
The essence of Discovery is to combine three steps:
1. Variation,
2. Evaluation, and
3. Selective retention.
Of course, I am not the first to say this. I am not the first to point out that this combination of steps is key to science, to evolution by natural selection, and to animal behavior. I think particularly of papers by Donald Campbell, by Daniel Dennett, and by Gary Cziko. What is new in my remarks is to directly relate the idea of Discovery to modern AI to help us see that it is not present in supervised learning or Generative AI—in particular, that Discovery is not present in backpropagation or gradient descent.
Let me say explicitly what is missing from Generative AI. As we have remarked, these systems do have a stochastic aspect, so they do generate a variety of trajectories and behavior. What is missing is the Evaluation step. The generator was pre-trained by supervised learning, leaving no way at runtime to Evaluate what it generates. And of course without Evaluation there can be no Selective retention, and thus no Discovery. The variation can bring novelty, but without evaluation there is no Discovery, and arguably, no creativity. That is, I would say that creativity requires that the new things generated be Evaluated. Without evaluation, and retention of the best, there is nothing created. The novelty flickers into existence but, if its value is unrecognized, it flickers away and is lost.
In many cases, Evaluation is done by people to make a discovery. As when we have Generative AI make many pictures for us, and then we pick the one that we like the best. The human+AI system completes the discovery.
In many other cases, the Evaluation comes from a clear objective. Some moves lead to checkmate, some steps lead to a proof, some actions result in high reward, some genotypes make more copies, some theories explain the data better.
Some prefer the Variation step to be called Blind variation, where “blind” here means that it is uninformed, a shot in the dark. It does not need to be completely uninformed; a good scientist does not select theories to test at random. But neither can it be completely informed and determined. There must be some uncertainty about where the answer lies in order for there to be a discovery. In practice, the variation is partly informed and partly blind, but it is the blind part that corresponds to the discovery.
Now let us briefly go all the way to modern deep learning, to the backpropagation algorithm. At first it might seem that backpropagation is incapable of discovery because it is deterministic and thus incapable of variation. But this is not correct. The weight updates of backprop are deterministic, but the weights are initialized to small random values. The random initialization is often downplayed, but in fact it is a necessary form of variation; it must be done properly to get good performance. In backprop this Variation is done once, at network initialization, so its effect is temporary, and later the network may lose its ability to learn. This is the weakness of deep learning that is alleviated with a new algorithm that my group presented in Nature a couple of years ago. Our “continual backpropagation” made one small change: every so often a less-used neuron would be re-initialized to small random weights. This allows the variation to continue and plasticity to be retained.
Although there is much more to be said about Creativity and Discovery, this is the key point: they are more than supervised learning, more than pattern recognition, more than prediction, and more than world modeling. Those things are important, but they alone will not bring us to discovery. Discovery requires Evaluation from a person or from an explicit goal, and only in the latter case will we attain full autonomy.
So that is my call to arms. If we want the full power of AI scientists, then we should share the goals with them so they can create, evaluate, discover, and in these ways fully participate in achieving the goals. Let’s be bold! Let’s fully automate Creativity and Discovery!
Stephen Wolfram: the subspace of images for which we have words for is exponentially tiny
Our brains work very hard to serialize our experience that is highly parallel into a single thread