I enjoyed working on this one. If you're interested in self-attention alternatives, this might interest you. Thanks to all those @ZyphraAI who helped out.
Today @ZyphraAI releases OVQ-attention, an advancement for efficient long-context processing!
Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute.
OVQ-attention is an alternative path. 🧵
Computer scientists often seem incredibly confident one way or the other about computational functionalism. What they should say is that the arguments both for and against provide only inconclusive considerations and the right attitude is therefore one of great uncertainty.
@ZyphraAI releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches.
We call it Hybrid Associative Memory (HAM). 🧵
OVQ shows a practical route to handling distribution shift via online codebook learning. The universal codebook result is the theoretical side: a fixed decoder can be near optimal for any activation covariance with only a tiny rate gap, if we can actually build that codebook.
Zyphra
Online Vector Quantized Attention
OVQ-attention keeps linear time and constant memory but avoids long-context collapse by learning both key and value centroids online, so memory tracks the live KV stream instead of a fixed dictionary. Sparse updates route each token to a single slot, so memory capacity scales without increasing per-token compute. Based on Gaussian Mixture Regression with online EM-style updates, it outperforms VQ and linear baselines, generalizes from ~4k training context to 64k+, and stays competitive with attention using ~10–25% of the state; still early at sub-500M scale and not kernel-optimized.
Today @ZyphraAI releases OVQ-attention, an advancement for efficient long-context processing!
Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute.
OVQ-attention is an alternative path. 🧵
At this point in attention-free architectures, so many people have poisoned the well that it's just a well of poison. A "Transformer Killer™" drops once a month, and then the authors come back and "kill" transformers again like 5 months later.
Love the work, I'm knee-deep in a lot of it, but please for the love of god stop over-hyping. Being grounded and pointing out your own limitations gets people more excited, I promise.
It’s not just about GPUs. It’s about the ecosystem.
@QuentinAnthon15 joined @jtatarchuk on the Beyond CUDA podcast to share how moving to @AMD MI300X cut training costs at @ZyphraAI
📺 Watch the full episode on YouTube (link in comments)
Learning in real time, during deployment, i.e. doing online-continual learning, effectively is important for many applications. It's also associated with theories of intelligence that emphasize learning efficiency, and is an ability where the gap between animals and AI is large.
seems big AI labs are hyperfixating on reasoning when they should focus on *memory* instead
normal people won't use models that can think for hours to solve hard math problems
people want models that learn over time, remember details, adapt and interact like a person would
Zyphra is releasing our first reasoning model, ZR1-1.5B. This small but powerful reasoning model excels at both math and code, making it one of the best models in these categories for its size. It also uses 60% less reasoning tokens than comparable models.
🆓Apache 2.0 license.
Thought experiment: what should a non-conscious alien scientist conclude about human theories of consciousness? What should humans think of the alien's conclusion? In my blog(link below), I argue this scenario supports Illusionist views of consciousness. @keithfrankish@eschwitz
(6/)The scenario also raises the question of how we could even get a non-conscious scientist to understand what we mean by terms like 'phenomenal character', a point which may support those who argue such terms are not meaningful enough to discuss in the first place. @petemandik
(5/) If we cannot find good reasons to convince a non-conscious scientist that phenomenal consciousness and the hard problem exist, then why should humans ever believe they do?