Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density.
With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵
@ZyphraAI releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches.
We call it Hybrid Associative Memory (HAM). 🧵
Many thanks to @ermgrant for being a great mentor! Thanks also to @SaxeLab for supporting me with this project during my internship at @GatsbyUCL. (10/10)
We’re excited to share our paper analyzing how data drives the emergence of localized receptive fields in neural networks! w/ @SaxeLab@ermgrant
Come see our #NeurIPS2024 spotlight poster today at 4:30–7:30 in the East Hall!
Paper: https://t.co/U2I285LLAE
In summary: We analytically study the dynamics of localization in a nonlinear neural network without top-down constraints, where we find that “edges” drive localization. We identified which forms of non-Gaussianity were necessary to get this structure to emerge. (9/10)