A really interesting phenomenon of diffusion:
It's L2 denoising training objective has a closed form solution, the conditional mean, which reduces to a gaussian kernel weighted average of the training points.
Sampling from this would only return training data, yet diffusion models generalize.
In this blog post, we highlight how properties of euclidean geometry (the concentration of gaussians, the L2 spacing of high dimensional points in R^d) of the diffusion objective and the training dataset can explain diffusion generalization: https://t.co/aSmBOYmHNB
After coding is solved, the next frontier is computer use. Today, we are launching Use Computer, the infra for evaluating and training models to use all kinds of computers 👇
🧵 Can a small (1.4B parameter) protein language model solve challenging protein scaffold design tasks by scaling inference compute?
Yes—but not simply through scaling the number of samples generated