The Navy is buying $250k disposal boats from a startup in Baltimore who is producing 32 per month with $200M in revenue a year.
These are BlackSea's 'GARCs' or Global Autonomous Reconnaissance Craft (GARC).
AI handles hands-free autonomy: navigations, collision avoidance, adaptive mission planning, and vessel intercepts. Starlink-enabled remote ops.
What's interesting is that the Navy stood up USVRON-3 aka "Hell Hounds" to run these. 400 personnel overseeing hundreds of autonomous crafts. A small team orchestrates a large fleet of these as drone swarms.
An entirely new job classification: the robotics warfare (RW) specialist. But BlackSea is the primary certified trainer of sUSVs through its sUSV Operator Certification Program.
The future is moving towards machine and AI orchestrators and the Navy is already putting it into practice.
5 names I’m watching closely right now, all riding the same wave IMO...
$PENG. Penguin Solutions is the AI factory builder. They take a billion dollars worth of $NVDA GPUs and turn them into a working cluster… they’ve already deployed over 85,000 GPUs for customers like Shell, Sandia, and one of Korea’s largest Blackwell clusters. Just got picked by Deepgram and Dell to architect their voice AI.
$SLNH. Soluna is the behind the meter power play. The US grid is breaking under the weight of AI data centers and PJM already said it publicly. Soluna co-locates green data centers directly with wind, solar, and hydro… they just signed a Siemens MOU for a 2MW behind the meter pilot and own the Briscoe Wind Farm.
$DGXX. Digi Power X just signed a 10 year deal with Cerebras worth up to $2.5B for a 40MW AI data center in Alabama. Cerebras IPOs on the Nasdaq this Thursday, which puts them in the spotlight as the public proxy.
$WYFI. WhiteFiber spun out of $BTBT, IPO’d at $17 in August, and is now sitting near $27. Q3 revenue was up 65% YoY and they just signed an $865M, 10 year colocation deal with Nscale at their NC-1 campus. BTIG just initiated coverage. Same playbook as $CRWV at a fraction of the size.
$OUST. Ouster is the lidar pure play and the robotics wave is the next leg of AI nobody is positioned for yet. Every humanoid robot, every autonomous truck, every smart warehouse needs eyes…
I added to my SLNH position and plan to hold until earnings... NFA!
Common thread on all five… AI infrastructure, real revenue, and the institutions haven’t really shown up yet.
SpaceX now 3D prints 40% of their rocket engines.
Musk has said SpaceX has "the most advanced 3D metal printing technology in the world."
There is only one U.S-headquartered company that designs and builds laser powder bed manufacturer domestically. That's Velo3D $VELO
It's up 31% today.
They nearly went bankrupt in 2024 before being saved by the defense boom. Revenue is now 40% YoY.
Might be the 3D printer for domestic defense manufacturing boom.
@NobletStrength I played club, high school then in college. Nothing beats HS ball with your boys. But.. the best players need to find the best competition and it will not be in High School. Same for AA bball and little league.
Datadog $DDOG is up 60% over the past 30 days.
It recorded its first $1B revenue quarter. It's clear: agents can't operate in a black box.
AI, which was supposed to make observability obsolete, just skyrocketed it demand.
Agents are making work cheap. But accountability very expensive. The demand for observability is exploding.
Even hyperscalars are signing deals to instrument their own labs. Agents can't operate in a black box. Even hyperscalars need observability into its training runs: failures, latency, training runs, GPU usage, etc.
Frankly, this is bullish for tech and growth. Agents are less likely to displace humanity but rather lead to a massive acceleration in demand.
@BravoKiloActual My 3rd real dive after certification was to 30 meters. Deeper than I was certified for.
It is intense at 30 meters. We spent just 8 minutes at this depth given our full dive duration.
To enter a cave at 60M is insane.
This is a symptom of living in a system of continued currency debasement.
If the price of a good life was falling every year, these people wouldn’t live such lives of financial anxiety. Even these affluent tech types.
Instead, every year the price of securing a good life drifts away at 8% a year and they must scramble to hold on to it.
Now imagine the mindset of someone who saw everything decreasing at 3% a year.
Inference got a hundred times cheaper this year. The compute bill went up anyway.
If you understand why those two sentences are both true at the same time, you understand the most important thing happening in AI right now.
I work on inference for a living, at @nebiustf, where we run open-source managed inference at scale. Most of what follows is what I'm seeing from inside the bill.
12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60.
Today, an equivalent quality of output costs roughly $0.50.
Price /token of o1-level intelligence has dropped about a 128x in a year.
Price of GPT-4-level output has dropped roughly 100x since the original GPT-4 shipped.
By any normal reading of a technology cost curve, this should be deflationary. It should be saving customers money.
The opposite has happened. The total compute bill at every hyperscaler is going up, not down. Anthropic just signed multi-year capacity deals with both XAI and Amazon. Microsoft's Azure capex guide for 2026 starts with an eight. OpenAI is reportedly spending more on compute every quarter than it did in all of 2023. Nvidia paid roughly twenty billion dollars to acquire Groq, an inference-specialist company that did not exist as a serious commercial entity three years ago.
The cost curve and the demand curve crossed, and then the demand curve lapped the cost curve.
Here is what happened underneath.
A reasoning model burns roughly 10x the output tokens of a non-reasoning model on the same task, because it spends most of its tokens thinking out loud before answering. An agentic workflow chains roughly twenty times the requests of a single-shot completion, because it loops, calls tools, plans, retries, and synthesizes. A modern deep-research query (the kind a research analyst can fire off in fifteen seconds and then walk away from for ten minutes) costs more compute than 10 original GPT-4 queries combined. We made every individual token a hundred times cheaper, and then we built a generation of products that consume ten thousand times more tokens.
This is the Jevons paradox playing out at trillion-dollar scale, in compressed time, in front of everyone. Jevons noticed in 1865 that making coal-burning more efficient did not reduce coal consumption. It increased it, because efficiency unlocked uses that were previously uneconomic. Steam engines became more practical at smaller scales. Whole industries that could not afford coal at the old price suddenly could. Britain's coal consumption rose sharply, not despite the efficiency gains, but because of them.
The same thing is happening to AI compute right now and it is happening faster than any analogous historical cycle. Falling token prices did not contract demand. They unlocked agents, deep research, code-writing systems, multi-step reasoning, persistent memory, the entire next layer of AI products. Every product in that next layer consumes orders of magnitude more compute than the chat interfaces it is replacing.
The math at the aggregate level is brutal: 100x cheaper tokens times 10 000 more tokens equals a 100x larger total bill.
The implications stack quickly.
If you are running a hyperscaler, your 2026 capex guide is not a peak. It is a step on a curve. Inference is structurally always-on, twenty-four hours a day, in a way that training never was. Training is bursty. You spin up a cluster, run for weeks or months, and stop. Inference runs continuously, scales with usage, and the usage curve is exponential. Your power bill, your cooling bill, your transceiver count, your storage footprint, all of these were sized for a workload mix that no longer exists.
If you are running an AI software company built on top of someone else's closed API, you have a problem that did not exist a year ago. Your gross margins get worse as your customers get more value out of your product, because the more they use it, the more compute you pay for. The companies that win this are the ones that figured out vertical integration before the math caught them.
If you are watching this from a distance and trying to understand where the next bottlenecks form, the answer is everywhere downstream of "more inference compute, always-on, with massive memory state per session." The KV cache, the running memory state of a long conversation or an agent loop, is the silent monster of the inference era. It does not scale linearly with parameters. It scales linearly with context length and number of agent steps. A long agent session can hold tens of gigabytes of state per user, per session.
Multiply that by every concurrent user of every product, and you understand why $MU, $SNDK, $TOWCF, and the entire memory and packaging layer have re-rated the way they have.
The CPU-to-GPU ratio is evolving. Training is 1:8. Basic chat inference is 1:4. Agentic inference is 1:1, sometimes CPU-heavy. Google has split its TPU line in two, with a dedicated inference chip carrying tripled SRAM for KV cache. $INTC and $AMD just spent two earnings calls explaining that this shift is structural, not cyclical. The hardware map is redrawing in real time and the financial press is mostly still writing about training clusters.
The right framing of where we are right now is not that AI is hitting a wall. The framing a year ago that scaling was hitting a wall was the most expensive bad take of the cycle. The right framing is that AI got dramatically cheaper, dramatically more capable, and dramatically more useful, and the cost of running it at the new equilibrium of demand is much higher than the cost at the old equilibrium of demand, because the new equilibrium is enormous.
A meaningful share of what we actually do at Token Factory, day to day, is help customers stop their bills from running away from them. KV-cache management. Speculative decoding. Quantization. Routing. The kind of vertical integration that, eighteen months ago, every product team was happy to leave abstracted away behind a closed API. The reason this stack matters now is the same reason this whole essay matters: at the new equilibrium of inference demand, the cost of treating compute as a commodity is no longer survivable. The companies that figure out the layer beneath the API are the ones who keep their margins.
Cheaper tokens. More tokens.
Same coal as 1865.
Inference got a hundred times cheaper this year. The compute bill went up anyway.
If you understand why those two sentences are both true at the same time, you understand the most important thing happening in AI right now.
I work on inference for a living, at @nebiustf, where we run open-source managed inference at scale. Most of what follows is what I'm seeing from inside the bill.
12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60.
Today, an equivalent quality of output costs roughly $0.50.
Price /token of o1-level intelligence has dropped about a 128x in a year.
Price of GPT-4-level output has dropped roughly 100x since the original GPT-4 shipped.
By any normal reading of a technology cost curve, this should be deflationary. It should be saving customers money.
The opposite has happened. The total compute bill at every hyperscaler is going up, not down. Anthropic just signed multi-year capacity deals with both XAI and Amazon. Microsoft's Azure capex guide for 2026 starts with an eight. OpenAI is reportedly spending more on compute every quarter than it did in all of 2023. Nvidia paid roughly twenty billion dollars to acquire Groq, an inference-specialist company that did not exist as a serious commercial entity three years ago.
The cost curve and the demand curve crossed, and then the demand curve lapped the cost curve.
Here is what happened underneath.
A reasoning model burns roughly 10x the output tokens of a non-reasoning model on the same task, because it spends most of its tokens thinking out loud before answering. An agentic workflow chains roughly twenty times the requests of a single-shot completion, because it loops, calls tools, plans, retries, and synthesizes. A modern deep-research query (the kind a research analyst can fire off in fifteen seconds and then walk away from for ten minutes) costs more compute than 10 original GPT-4 queries combined. We made every individual token a hundred times cheaper, and then we built a generation of products that consume ten thousand times more tokens.
This is the Jevons paradox playing out at trillion-dollar scale, in compressed time, in front of everyone. Jevons noticed in 1865 that making coal-burning more efficient did not reduce coal consumption. It increased it, because efficiency unlocked uses that were previously uneconomic. Steam engines became more practical at smaller scales. Whole industries that could not afford coal at the old price suddenly could. Britain's coal consumption rose sharply, not despite the efficiency gains, but because of them.
The same thing is happening to AI compute right now and it is happening faster than any analogous historical cycle. Falling token prices did not contract demand. They unlocked agents, deep research, code-writing systems, multi-step reasoning, persistent memory, the entire next layer of AI products. Every product in that next layer consumes orders of magnitude more compute than the chat interfaces it is replacing.
The math at the aggregate level is brutal: 100x cheaper tokens times 10 000 more tokens equals a 100x larger total bill.
The implications stack quickly.
If you are running a hyperscaler, your 2026 capex guide is not a peak. It is a step on a curve. Inference is structurally always-on, twenty-four hours a day, in a way that training never was. Training is bursty. You spin up a cluster, run for weeks or months, and stop. Inference runs continuously, scales with usage, and the usage curve is exponential. Your power bill, your cooling bill, your transceiver count, your storage footprint, all of these were sized for a workload mix that no longer exists.
If you are running an AI software company built on top of someone else's closed API, you have a problem that did not exist a year ago. Your gross margins get worse as your customers get more value out of your product, because the more they use it, the more compute you pay for. The companies that win this are the ones that figured out vertical integration before the math caught them.
If you are watching this from a distance and trying to understand where the next bottlenecks form, the answer is everywhere downstream of "more inference compute, always-on, with massive memory state per session." The KV cache, the running memory state of a long conversation or an agent loop, is the silent monster of the inference era. It does not scale linearly with parameters. It scales linearly with context length and number of agent steps. A long agent session can hold tens of gigabytes of state per user, per session.
Multiply that by every concurrent user of every product, and you understand why $MU, $SNDK, $TOWCF, and the entire memory and packaging layer have re-rated the way they have.
The CPU-to-GPU ratio is evolving. Training is 1:8. Basic chat inference is 1:4. Agentic inference is 1:1, sometimes CPU-heavy. Google has split its TPU line in two, with a dedicated inference chip carrying tripled SRAM for KV cache. $INTC and $AMD just spent two earnings calls explaining that this shift is structural, not cyclical. The hardware map is redrawing in real time and the financial press is mostly still writing about training clusters.
The right framing of where we are right now is not that AI is hitting a wall. The framing a year ago that scaling was hitting a wall was the most expensive bad take of the cycle. The right framing is that AI got dramatically cheaper, dramatically more capable, and dramatically more useful, and the cost of running it at the new equilibrium of demand is much higher than the cost at the old equilibrium of demand, because the new equilibrium is enormous.
A meaningful share of what we actually do at Token Factory, day to day, is help customers stop their bills from running away from them. KV-cache management. Speculative decoding. Quantization. Routing. The kind of vertical integration that, eighteen months ago, every product team was happy to leave abstracted away behind a closed API. The reason this stack matters now is the same reason this whole essay matters: at the new equilibrium of inference demand, the cost of treating compute as a commodity is no longer survivable. The companies that figure out the layer beneath the API are the ones who keep their margins.
Cheaper tokens. More tokens.
Same coal as 1865.
Incredible response by @marcorubio when asked his hope for America
“My hope for America is what it’s always been. It’s the hope I hope we all share. We want it to continue to be the place where anyone from anywhere can achieve anything, where you’re not limited by the circumstances of your birth, by the color of your skin, by your ethnicity, but frankly, it’s a place where you are able to overcome challenges and achieve your full potential.
I think that should be the goal of every country in the world, frankly, but I think in the U.S. – we’re not perfect. Our history is not one of perfection, but it’s still better than anybody else’s history. And ours is a story of perpetual improvement. Each generation has left the next generation of Americans freer, more prosperous, safer, and that is our goal as well.
But it is a unique and exceptional country, and as we come upon this 250-year anniversary I think we have a lot to learn and be proud of in our history. It is one of perpetual and continuous improvement where each generation has done its part to bring us closer to fulfilling the vision that the founders of this country had upon its founding.”
@CKCapitalxx The market is still pricing these as cyclical commodities, which typically get the 5 P/E ratio. But this can be re-rated far higher once that structural change plays out.
30% higher going from 5 > 6 P/E
Not only that. But dysporium, terbium and finished magnets are in F-35s, submarines, wind turbines, and EV factories.
If 2M robots are being delivered by 2035, that's 1,800 tonnes of new annual demand. Collectively its 97,000 metric tons by 2030. The US is just ramping up to ~10,000 metric tons a year.
There's so much more to go.