For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall.
We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal.
This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.
Inception's founding team came together at Stanford.
@adityagrover_ was @StefanoErmon's second PhD student. @volokuleshov joined the group as a postdoc. Aditya and Volo shared an office.
Years later, Stefano's lab hit a breakthrough on diffusion for language. Volo's group at Cornell was publishing adjacent work. Aditya's research at UCLA overlapped with the direction.
Part 3 of our founder story series with @timt at @menloventures ↓