As the family IT guy its so disappointing how bad of an experience technology is for non-technical people.
I had the distinct pleasure of building educational software for kids full time for a summer while in college (s/o to @WilliamsonMark), and I remember they did weekly/biweekly user testing where a group of toddlers would come in and we'd record them using the software in various states and then adjust accordingly.
Every single session was SHOCKINGLY illuminating. Like, I expected after a number of these I'd empathize more and build better toddler software one-shot right? Hell fucking no. Every user study was so educational. I learned I simply can't enter the mind of a toddler.
Do TV companies, Netflix/Roku/etc. do user studies with elderly people? Do they realize how dogshit and impossible to navigate their interfaces are?
Asking some elderly family members to "sign up and schedule an Uber to pick you up for the airport" is like mission impossible. I thought they were exaggerating, then I tried the experience and holy shit man. Try cold finding, installing, signing up, and scheduling an Uber on a 5 year old iPhone with max font size. Its insane.
we can all make pinanfarina slop.
we can all take the model trained on a million ferrari photos and make an image that looks like a ferrari.
we can all one-shot a better looking car than what jony ive just came up with.
i get the logic. look here, with close to zero effort i can conjure a thing of objective beauty.
but with even less effort than that, you can simply pull up an old photo of a 250 GTO and say, "this is beautiful."
there's something deeply insidious about LLM "creation":
the danger the luce portends, which i don't see abating anytime soon--is that in the domains in which we cannot create, have no notion of what it takes to create, creation now seems within our grasp.
we have been granted a tool whose output is by definition maximally derivative, yet convinces us we're each an inventor
Sam Altman recently said "we have to become an AI inference company now"
Feels like that sentence is the cleanest re-org of the year, and kinda went under the radar.
The frame the public still uses is training. Who has the biggest cluster, the most data, the best post-training pipeline, the boldest scaling bet. That story is still real, but it is not where the marginal dollar goes in 2026.
The marginal dollar goes to serving a reasoning model that has to think for ten seconds before it answers, hold a million-token context without falling over, fan out to a tool, come back, verify itself, and bill you for every token in the trajectory. The training run is amortized. The serving run repeats every time a user opens the app.
Anthropic is not buying compute the way a research lab buys compute. The recent disclosures point at 300 megawatts on SpaceX's Colossus, close to a gigawatt by end of 2026 through Amazon, and multi-gigawatt commitments beyond that. Those are not data points about training capacity. They are forward contracts on time-and-place tokens.
That is the shift. Capacity is being priced as a delivery promise, not as a science project.
For people building on top of these labs, the implication is concrete. The premium has moved off the model and onto the system that serves it. Cache hit rate. Speculative decoding acceptance rate. Prefix reuse. Reasoning budget per request class. KV layout. Routing across a fast cheap model, a long-thinking one, and a tool sandbox.
@nebiustf runs the bottom of that stack for a living. What I see from this seat is a customer base that no longer asks "which model." They ask which inference profile, at which latency, at which cache hit rate, at which dollar per resolved task. The unit changed. The bill changed. The vendor relationships are changing with them.
maybe @sama was just pointing towards the new unit of work
franchement l'annonce de Huawei hier à Shanghai vient de mettre noir sur blanc ce que j'essaie de vous faire comprendre ici depuis des années
pour vulgariser Huawei vient d’annoncer au sympoisum ieee iscas une nouvelle loi physique qui remplace la loi de moore, ils l’appellent la tau scaling law et elle change littéralement le paradigme du semiconducteur mondial, en gros au lieu de continuer à rétrécir les transistors ce qui se heurte à des limites physiques quantiques infranchissables, ils optimisent dorénavant la constante de temps tau à 4 niveaux simultanément et obtiennent des gains de performance équivalents à ce que les américains atteignent avec leur lithographie euv à 200 millions de dollars la machine, sauf qu’eux n’ont pas accès à cette lithographie depuis les sanctions de 2019
la Chine dépasse donc la silicon valley sur son propre terrain et la rend périmée et ce qui se joue réellement est l'exact contraire de ce que Washington imaginait en décidant des sanctions de 2019
en ce sens je crois que très peu de gens ont pris la peine de regarder vraiment les slides de la présentation parce que le coeur de la rupture se cache ailleurs que dans le concept marketing de tau scaling law, il se trouve dans un détail technique que seuls quelques ingénieurs spécialisés ont remarqué (et que je suis parti fouiller haha), il existe visiblement un procédé de collage entre couches de silicium avec un espacement + petit que 2 micromètres, ce qui transforme les fils verticaux reliant les différentes couches d'une même puce en chemins de calcul à part entière, ils maîtrisent là l'intégration en 3 dimensions au sens fort pendant que le reste du monde raisonne encore sur un seul plan horizontal
pour moi la meilleure image c'est celle d'un architecte qui construit une tour pendant que ses concurrents continuent d'étaler des maisons individuelles à l'horizontale, intel et tsmc se battent pour graver des transistors toujours plus minuscules parce que leurs lithographies euv les enferment dans cette logique, huawei coupé de ces lithographies depuis 2019 a choisi un autre combat, raccourcir au maximum le temps qu'un signal électrique met pour traverser l'ensemble du système, cette durée qu'ils nomment tau et qu'ils minimisent simultanément au niveau du composant du circuit de la puce et de la machine complète, c'est de la physique réelle présentée dans la conférence ieee la plus sérieuse au monde sur le sujet
d’ailleurs les chiffres font réfléchir, allez jeter un coupé d’œil et vous allez voir que la densité de transistors monte de 126 à plus de 400 millions par millimètre carré entre 2024 et 2031, la fréquence des coeurs grimpe de 2,6 à 5 gigahertz, même la performance des systèmes complets fait x125 en 4 ans entre 2026 et 2030 & surtout 381 puces ont déjà été fabriquées en série selon ces principes depuis 2020, autant dire qu'ils ont commencé à changer de paradigme dès la première vague de sanctions américaines, 6 années de travail discret pendant que les analystes occidentaux les croyaient en mode survie mdr ce que tout le monde prenait pour de la résistance était en réalité un virage stratégique médité et mené avec la patience d'un peuple qui voit à 50 ans (la vision à très long terme de la Chine dont je vous parle souvent )
je vous le répète depuis des années ici, les sanctions occidentales accélèrent la politique industrielle et tech de Chine au lieu de la freiner, elles l'obligent à inventer le monde d'après pendant que l’occident reste coincé dans celui d'avant, d’ailleurs pour info même BYD a créé la batterie LFP face au blocage du nickel et il domine désormais le marché mondial de la voiture électrique, deepseek a conçu son architecture multi-head latent attention face au blocage des puces h100 et il a divisé par 10 le coût des grands modèles de langage, Huawei vient de poser logicfolding et tau scaling face au blocage de l'euv et il redessine déjà la trajectoire mondiale du semiconducteur jusqu'en 2031
RL has almost always meant trying to maximize a scalar reward.
Very expressive in theory, but do you have only ONE scalar reward? Preferences & tradeoffs are complex & high-dimensional!
Vector Policy Optimization (VPO) trains LLMs to anticipate diverse environments and goals!
Have you ever thought about the Pokémon definition of a “poacher”? It's basically just a dystopian corporate monopoly.
In the lore, you are branded a poacher simply for using nets, traps, or cages instead of a Poké Ball.
The entire system is a massive "Big Brother" compliance checkpoint:
• The Tech Monopoly: The law basically mandates that you must use patented corporate technology to legally interact with wildlife.
• The Global Registry: The real crime isn't the catch itself—it is dodging the database. Poké Balls automatically tag every capture to a centralized Trainer ID grid. Catching a Pokémon with a net keeps it completely untraceable and off the books.
• The Safari Loophole: People claim poachers are criminals because they over-hunt, but Ash caught an entire herd of 30 Tauros in a single afternoon. Because he paid a corporate entrance fee and used officially licensed Safari Balls, it was perfectly legal.
The Pokémon League just wants absolute regulatory control over a heavily monetized digital capture grid.
I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out.
I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really).
It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely.
The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture.
We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying.
I worry.
Fork your dependencies, trim them to only your use case, never update unless it breaks for your users. I’ve been vocal about this for 10+ years. I’ve always said that updating is way riskier than latent bugs (which can be tracked and CVEs monitored).
If you are updating a dependency, it’s on you to analyze every single commit in the full transitive set of dependencies. If you dont see anything compelling, dont update!
I remember at HashiCorp once in awhile an engineer would try to update a dep or replace a DIY lib with an external one and id always ask “show me the commit we need.” Dont update for the sake of it.
Feeling pretty swell about this mentality with all the supply chain attacks happening.
The report spam UX needs to be simpler. Used to be 1 click now it’s 1 click plus a slow page load, a multiple choice selection, then confirm. But also why am I still getting notifications for crypto scams in 2026 one quick glance by grok would filter it out no? @nikitabier
Gemini Omni is a major leap in world understanding & multimodal editing! It can take photos, video & audio and build entirely new scenes. Over time it’ll be able to handle any input & any output - starting w/ video
You can even give it your own videos & iterate on your ideas:
The bitter lesson in 26 words:
Don’t be distracted by human knowledge, as AI has been historically.
Instead focus on methods for creating knowledge that scale with computation, like search and learning.