Meet DiffusionGemma!
An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.
Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇
AFM Core Advanced on-device model running on A19 Pro is a sparse model.
It's 20B parameters.
It's fully Apple designed. It is an MoE but when it processes the prompt, it only loads the parameters needed and locks them in.
If it's 20B parameters total, but on a specific request it's only 1-4B parameter total. It only loads in 1-4B for inference and decides them at prefill time.
It is fully Apple designed architecture, Google had nothing here.