@andrew_n_carr I saw a demo for this in a ML class at the Czech Technical University in Prague back in early 2010'? Granted, it was definitely a Matlab demo, smartphones were early back then 😅
@scaling01 Training data pipelines are already agent-ish in principle. First we engineer out the ish, in the meantime smart enough agent will start delivering results orchestrating this. I expect a surprisingly smooth transition.
Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
https://t.co/c9AvsRKybj
What if we didn’t have to hold an entire neural network in memory to train it?
Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network.
In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance.
With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block.
How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently.
We validated this across five different architectures:
• ViT
• DiT
• Masked diffusion
• Autoregressive transformers
• Recurrent-depth transformers
In each case, performance is competitive with end-to-end training while using a fraction of the memory.
This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training.
Read our paper and code, to learn more.
Paper: https://t.co/CRj96VGYQn
GitHub: https://t.co/eNW0K9Xh8E
🐟
Por volta de 2013 a cidade de Chengdu tem adicionado trepadeiras em seus viadutos para conter ilhas de calor e o difícil trabalho de tornar essa infraestrutura mais agradável.
@Testurdla@scaling01 I hope I read this correctly and everyone is fine and happy for her. I perceive this as criticism of the journalist sucking up to her.
@scaling01 dude, if you build a new one, not if you decomision per-existing super-expensive powerplant you already spent fortune to build over the decades