AFM Core Advanced on-device model running on A19 Pro is a sparse model.
It's 20B parameters.
It's fully Apple designed. It is an MoE but when it processes the prompt, it only loads the parameters needed and locks them in.
If it's 20B parameters total, but on a specific request it's only 1-4B parameter total. It only loads in 1-4B for inference and decides them at prefill time.
It is fully Apple designed architecture, Google had nothing here.