Congratulations to Google on open-sourcing Gemma Diffusion!
I want to give a shout-out to a group of really talented Cornell students who developed in the lab a lot of the new ideas that we see in this model:
@mariannearr -- Block diffusion is what enables Gemma Diffusion to generate arbitrary length sequences and support KV caching.
@mariannearr@SchiffYair -- Efficient encoder-decoder diffusion (E2D2) extends block diffusion and is part of what makes Gemma really fast, speeding up inference by running a smaller decoder model.
@SchiffYair@ssahoo_@Guanghan__Wang -- Uniform diffusion LMs (UDLMs) are the family of discrete diffusion models that underlie Gemma and define its noise process and training objective. This work builds on our earlier simplified losses in MDLMs.
@ssahoo_ -- Uniform diffusion supports built-in error correction and is especially effective with distilled fast samplers like the ones introduced in Duo.
This is a great overview of Gemma Diffusion: https://t.co/MXLfgPPNc4
Check out the students' papers below:
I think the reason for Andrejโs move is likely fairly simple. He just needs unlimited access to the best and the most amazing frontier model and full academic freedom to pedal to the metal accelerate his frontier work (research/products or both) which xAI/Tesla/Grok Build can not offer him for now. Then he can share it openly like before ๐ซ
๐จ Meet Doris, she lives in California and is registered as a 126 year old who has voted in 51 elections and has NO IDEA.
Californiaโs voting system is so corrupt that by simply knocking on the door of the โ126 year oldโ proves election fraud.
EXPOSE IT ALL.
@lateinteraction it was my idea :)
Using GEPA is a very natural workflow for creating LLM programs. The iteration speed is very quick, and it easily allows researchers to bias the optimization with some priors (usually derived from just looking at the data).
Thanks a lot for the great tool!
Tip: If you have a daily workflow you want to automate, ask the agent to write down the steps so it can faithfully execute it every time. It is an extension of its skills.
It may take some iterations to reach deterministic results, but itโs worth the effort.
In my private repo, I have more plans than code.
You should, however, make sure to write a unit test against your plans so that when the model changes it does not catch you off guard.
@PeterDiamandis SpaceX had achieved nothing of note after 3 years and was written off as dead after 6 years with 3 consecutive launch failures.
But you may have noticed that things are different now.
Each one built with love. When @elonmusk said that, really choked me up. Everyday we make our products with our customers in mind. We love all of you more than you know. Thanks for CONSTANTLY lifting us up. ALL THE LOVE!!!!