a reminder to my generation: this life is fucking electric, don’t let any doomer shit bring you down. we will live in the most interesting timeline in history and work on some of the hardest, most rewarding problems of all time.
https://t.co/PE0uACOlrc
it will be interesting to see how a free market economy survives this shift.
right now a lot of core business logic is increasingly routed through a handful of foundation models. those models don’t just process the data - they effectively own it by proxy and can train on it. that concentration feels at odds with our hopes for a free market economy.
at scale i expect foundation models to keep getting commoditized and open-sourced. the real advantage will shift to the companies willing to do serious post-training and reinforcement learning on top of them - customizing the models to their actual workflows, protecting their IP, and compounding efficiency in ways generic models can’t touch.
that’s the hill-climbing machine Satya was talking about. human capital keeps finding the next hills worth climbing, and the post-trained systems turn those discoveries into durable, compounding advantage.
@restorefast shares these views
I wrote a research paper a few months ago on this as a paradigm shift in my paper "Beyond Transformers," arguing that the autoregressive paradigm is fundamentally backwards: we spend ~99% of compute on deterministic matrix math to approximate a probability distribution, then a trivial PRNG samples from it. That always sounded backwards to me - defining the distribution should be cheap, and sampling should be the native operation.
Instead of one token at a time, diffusion models start with noise and iteratively refines entire blocks in parallel, until it converges. This is closer to how physical systems actually behave: a state evolving toward low-energy configurations through an energy landscape. Their "self-correction" is a digital approximation of attractor dynamics. Their parallel refinement is a digital approximation of a system settling into an energy basin.
I wrote in my paper that "diffusion-based language models, energy-based text generation, and continuous-space language models all represent steps in this direction - and may offer a smoother transition path to future stochastic hardware." I think models like this add some backing to my thesis.
But the next step isn’t a bigger GPU. The core workload that models like DiffusionGemma run on H100s - iterative denoising over probability distributions - is exactly the type of probabilistic sampling work that would be native on stochastic hardware.
A good example of modern stochastic hardware (still early and under active R&D) is @extropic’s Thermodynamic Sampling Units (TSUs), which use physical thermal fluctuations as the computational mechanism. With further development, the need for heavy matrix math to approximate distributions could be reduced or eliminated, as the physics could handle sampling directly (hint: I am also bullish TSUs because they could be great in space)
I've been working on building a diffusion LLM running inference through thermodynamic sampling primitives with learned pairwise coupling between token positions. And a gradient-free training algorithm that replaces backpropagation with local correlation statistics, designed for hardware where stochasticity is free. Still a work in progress lol but even today show's promise.
DiffusionGemma proves the paradigm is becoming more viable at scale - I think we'll continue to see dLLMs grow in popularity. The question now is what happens when the hardware matches the math.
I wrote a research paper a few months ago on this as a paradigm shift in my paper "Beyond Transformers," arguing that the autoregressive paradigm is fundamentally backwards: we spend ~99% of compute on deterministic matrix math to approximate a probability distribution, then a trivial PRNG samples from it. That always sounded backwards to me - defining the distribution should be cheap, and sampling should be the native operation.
Instead of one token at a time, diffusion models start with noise and iteratively refines entire blocks in parallel, until it converges. This is closer to how physical systems actually behave: a state evolving toward low-energy configurations through an energy landscape. Their "self-correction" is a digital approximation of attractor dynamics. Their parallel refinement is a digital approximation of a system settling into an energy basin.
I wrote in my paper that "diffusion-based language models, energy-based text generation, and continuous-space language models all represent steps in this direction - and may offer a smoother transition path to future stochastic hardware." I think models like this add some backing to my thesis.
But the next step isn’t a bigger GPU. The core workload that models like DiffusionGemma run on H100s - iterative denoising over probability distributions - is exactly the type of probabilistic sampling work that would be native on stochastic hardware.
A good example of modern stochastic hardware (still early and under active R&D) is @extropic’s Thermodynamic Sampling Units (TSUs), which use physical thermal fluctuations as the computational mechanism. With further development, the need for heavy matrix math to approximate distributions could be reduced or eliminated, as the physics could handle sampling directly (hint: I am also bullish TSUs because they could be great in space)
I've been working on building a diffusion LLM running inference through thermodynamic sampling primitives with learned pairwise coupling between token positions. And a gradient-free training algorithm that replaces backpropagation with local correlation statistics, designed for hardware where stochasticity is free. Still a work in progress lol but even today show's promise.
DiffusionGemma proves the paradigm is becoming more viable at scale - I think we'll continue to see dLLMs grow in popularity. The question now is what happens when the hardware matches the math.
@marcusyul smells like survivorship bias
you are making assumptions based on complaints companies that are “losing”
the winners are not posting on X about it lol they are just stacking wins and laying people off, but trust that some companies are winning
We looked at the jobs pages of 910 early-stage startups from the top accelerator programs, analyzing who they want to hire and for how much.
Explore at our latest drop:
https://t.co/aDHLO22wtR
The fastest way to ruin a beautiful feeling is to make it unlimited.
One feeling I try to protect is being genuinely impressed.
So I surround myself with people who are hungry, sharp, and operating at a level that makes me uncomfortable.
But the trick is: to keep being impressed, I have to keep getting better too.
Your environment raises your standards, then your standards force you to grow.
There is nothing more powerful than well-informed optimism. It has to be well-informed though. The "everything will be fine" type of optimism may also be somewhat useful, but it's not as useful as the "Hmm, what if we tried x?" kind.
If you're a naturally anxious person, I recommend pursuing a high stress career path where at least you'll be compensated for anxiety you're going to have anyways.
@Jason isnt the main point of contention that data center water competes directly with municipal water supply (tap) whereas almond water is from agricultural sources (raw, untreated, from underground aquifers, etc)
still crazy chart though lol