@sh_reya I've been a broken record for the past 3 years (and the prior 9 before it running and exiting a data consultancy, and the prior 10 before that in 'ETL' and data warehouse work) that at some point, everyone will care about the data side here. Not to mention the modeling side.
For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall.
We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal.
This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.
@AndrewCurran_@emollick Inevitably everyone will retreat to their own private communities until whenever we centralize again happens. Feels like it will be like early bbs systems then irc channels then public niche forums then bam MySpace and Facebook
@karpathy@pankajmathur_@shreyansj Titles like this are my favorite. Especially these days when a title can narrow how you’re perceived and seen and what you do. When it’s a lot of things on a small team even more so. When you cross business and technical most titles would infer non technical.
@levie Every single thing you deploy in AI means more people to iterate and maintain it. It's never 'done', like any data or software project. Just means different skills.
@joannejang The harness is the model(s) and a whole lot of system behind it. Excited for generative UI. Google is doing it well. Claude getting there slowly. But would love to see more than chat and canvas
Creo que muchos están enfocando mal el modelo de Gemini Omni al compararlo con Seedance 2.0 cuando conceptualmente son cosas distintas.
Este es un modelo para editar vídeos (a la Nano Banana) como nunca antes habíamos tenido!
@pvncher And decomposing queries is the way. As granular as you need to and verifiers/subagents (whatever we wanna call clear context except what’s provided by the parent to succeed) to evaluate whatever that granular decomp is at each layer.
@AndrewCurran_@inductionheads Also would explain why OpenAI is trying to buy a company that makes diffusion based language models(or did buy them… I’m a few days out of the loop)
@AndrewCurran_@inductionheads Diffusion hybrid models is my bet. Crazy fast in a way that changes how you can now use LLMs. Which means new product experiences.
@jonasgeiping@guinansu@kyutai_labs Or at least the two approaches remind me of each other. And moshi was the first time I saw how real voice to voice can work
@jonasgeiping@guinansu This is a really great idea. The prompt injection one and separating output and input is sorta like how @kyutai_labs first did moshi in a way? Full duplex?
What do we gain? First off, we can improve latencies because we now overlap thinking, system inputs, tool use and even auditing calls (and we show this in the paper).
Second, we find that the models we train in a clean ablation with this format actually have a significantly easier time withstanding prompt injections, because it is easier to separate input and output if they are separate streams.