Hassan Shapourian @Hasan_Shap - Twitter Profile

Pinned Tweet

about 1 month ago

this one is special. it has been quite a ride. we built the entire training pipeline and data stack from scratch. today we shipped ZAYA1-VL-8B: a strong compact MoE VLM that punches above its weight class. so proud of this team. and we're just getting started🚀

Zyphra

@ZyphraAI

about 1 month ago

Today we're releasing ZAYA1-VL-8B, our first vision-language model. ZAYA1-VL-8B is a 700M active / 8B total MoE built on our ZAYA1-8B base trained on @AMD. We achieve strong performance for our size resulting in leading intelligence density and inference efficiency.

ZyphraAI's tweet photo. Today we're releasing ZAYA1-VL-8B, our first vision-language model.

ZAYA1-VL-8B is a 700M active / 8B total MoE built on our ZAYA1-8B base trained on @AMD. We achieve strong performance for our size resulting in leading intelligence density and inference efficiency. https://t.co/31BY6rtKvG

14

490

60

166

1M

1

21

3

1

1K

Hasan_Shap retweeted

Zyphra

@ZyphraAI

2 days ago

Zyphra Research is releasing Norm-AGnostic residual networks (NAG) - a new architecture that mitigates the diminishing returns of deeper residual models by controlling the residual stream geometry. NAG makes Mixture-of-Depths practical for pretraining.

ZyphraAI's tweet photo. Zyphra Research is releasing Norm-AGnostic residual networks (NAG) - a new architecture that mitigates the diminishing returns of deeper residual models by controlling the residual stream geometry.

NAG makes Mixture-of-Depths practical for pretraining. https://t.co/uGrFpxoViV

4

151

21

87

117K

Hasan_Shap retweeted

Zyphra

@ZyphraAI

6 days ago

Today we're releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning. ZONOS2 is the most expressive open-source TTS model, released under Apache 2.0 and available on Zyphra Cloud on @AMD. 🧵

22

696

108

674

330K

Hassan Shapourian @Hasan_Shap

8 days ago

Vision workloads are notoriously demanding on memory and compute. Exploring SSMs in vision-language models opens up a different part of the design space—one where efficiency and capability can scale together.

Zyphra

@ZyphraAI

8 days ago

Zyphra Research continues to explore architecture innovations beyond standard transformers. Today we’re releasing Zamba2-VL, extending our prior Zamba2 hybrid SSM-Transformer work into vision-language modeling. 🧵

ZyphraAI's tweet photo. Zyphra Research continues to explore architecture innovations beyond standard transformers.

Today we’re releasing Zamba2-VL, extending our prior Zamba2 hybrid SSM-Transformer work into vision-language modeling. 🧵 https://t.co/q7R9GdnaIh

4

128

21

32

26K

0

2

0

207

Who to follow

Victor V. Albert

@victorvalbert

Theoretical physicist @NIST, Fellow @JointQuICS, Zookeeper @theeczoo. Views my own.

Error correction zoo

@theeczoo

Repository and taxonomy of encodings to robustly store and process classical or quantum information. Recent results and highlights. 1000 codes and counting!

Pedram Roushan

@PedramRoushan

Quantum researcher at Google.

Hassan Shapourian @Hasan_Shap

8 days ago

It all started with a basic question: what happens when you bring hybrid SSM-Transformer architectures to multimodal intelligence? Zamba2-VL is an early step toward that future. Huge credit to our vision team for making it happen.

Zyphra

@ZyphraAI

8 days ago

Zamba2-VL is competitive with the leading open Transformer vision-language models of comparable scale, including Qwen3-VL, InternVL3.5, Molmo2, and PerceptionLM, across image understanding, reasoning, OCR, grounding, and counting benchmarks.

ZyphraAI's tweet photo. Zamba2-VL is competitive with the leading open Transformer vision-language models of comparable scale, including Qwen3-VL, InternVL3.5, Molmo2, and PerceptionLM, across image understanding, reasoning, OCR, grounding, and counting benchmarks. https://t.co/HBFsnQOX4G

1

4

0

1

1K

0

3

0

153

Hasan_Shap retweeted

Zyphra

@ZyphraAI

about 1 month ago

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

ZyphraAI's tweet photo. We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD.

Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference.

We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵 https://t.co/xMXp4sFYkb

22

690

85

250

1M

Hasan_Shap retweeted

Zyphra

@ZyphraAI

about 1 month ago

Today we’re announcing 15MW of AMD Instinct MI355 GPU capacity through Zyphra Cloud, our full-stack neocloud powered by @AMD.

ZyphraAI's tweet photo. Today we’re announcing 15MW of AMD Instinct MI355 GPU capacity through Zyphra Cloud, our full-stack neocloud powered by @AMD. https://t.co/G4WnCoAWJc

2

360

35

61

863K

Hasan_Shap retweeted

Zyphra

@ZyphraAI

about 1 month ago

Today we're releasing ZAYA1-74B-Preview, a major milestone in scaling pretraining on @AMD. ZAYA1-74B-Preview is a 4B active / 74B total MoE. This preview model is a strong pre-RL base checkpoint. The final post-trained reasoning model is coming soon. 🧵

ZyphraAI's tweet photo. Today we're releasing ZAYA1-74B-Preview, a major milestone in scaling pretraining on @AMD.

ZAYA1-74B-Preview is a 4B active / 74B total MoE.

This preview model is a strong pre-RL base checkpoint. The final post-trained reasoning model is coming soon. 🧵 https://t.co/2zJ3q8jEdV

24

798

87

226

1M

Hasan_Shap retweeted

rishi @rishiiyer01

about 1 month ago

this model, and our next release was an insane ad hoc learning experience in scaling and reasoning about pretraining for me. All credit goes to @rawsh0 and team for extracting the most out of the pretraining base. It is insanely strong for its size

7

83

10

13

7K

Hasan_Shap retweeted

Robert Washbourne

@rawsh0

about 1 month ago

new model! strong <1B active MoE led data and posttraining for this release. cca goat @rishiiyer01 and the pretraining squad cooked https://t.co/j808U7FxG5

8

75

12

7

6K

Hasan_Shap retweeted

Beren Millidge

@BerenMillidge

about 1 month ago

Incredible work from the entire Zyphra team for this one! We never expected that our small ZAYA1 would be able to compete (at least in math) with the frontier giants. Our post-training and pre-training stacks are strong. More general thoughts on the ZAYA release, a 🧵

9

86

8

15

6K

Hasan_Shap retweeted

Zyphra

@ZyphraAI

about 1 month ago

@ZyphraAI is an open superintelligence research and product company based in San Francisco, CA on a mission to build human-aligned AI that helps individuals and organizations reach their fullest potential. Apply to join us! https://t.co/1Eika8rWxz

3

96

5

22

12K

Hasan_Shap retweeted

Zyphra

@ZyphraAI

about 1 month ago

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

ZyphraAI's tweet photo. Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density.

With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵 https://t.co/URTj1br9tw

100

2K

291

2K

1M

Hasan_Shap retweeted

Zyphra

@ZyphraAI

3 months ago

@ZyphraAI releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches. We call it Hybrid Associative Memory (HAM). 🧵

ZyphraAI's tweet photo. @ZyphraAI releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches.

We call it Hybrid Associative Memory (HAM). 🧵 https://t.co/xcFq0p2VUG

5

42

15

12

8K

Hassan Shapourian @Hasan_Shap

3 months ago

🔥Exciting times to join us! Many opportunities to contribute to open-source AGI 🧠

Zyphra

@ZyphraAI

3 months ago

Zyphra is hiring out of our new office in San Francisco. We are on the mission to build open superintelligence and have multiple roles open across research, engineering, product, and GTM. Join us: https://t.co/pqARkkw2hK

ZyphraAI's tweet photo. Zyphra is hiring out of our new office in San Francisco.

We are on the mission to build open superintelligence and have multiple roles open across research, engineering, product, and GTM.

Join us: https://t.co/pqARkkw2hK https://t.co/kyTZjnBVTc

0

20

3

0

23K

0

4

0

221

Hasan_Shap retweeted

Peyman Milanfar

@docmilanfar

3 months ago

Happy Noruz, the first day of Spring, the Persian new year. May the new year bring peace to the innocent, and wisdom to the powerful who can bring it about.

docmilanfar's tweet photo. Happy Noruz, the first day of Spring, the Persian new year.

May the new year bring peace to the innocent, and wisdom to the powerful who can bring it about. https://t.co/NDUJSRFZG9

7

253

22

5

14K

Hassan Shapourian @Hasan_Shap

3 months ago

@roozbehp به برگزارکنندگان دقت کنید…👎

0

156

Hassan Shapourian @Hasan_Shap

3 months ago

@sirbayes Iran ~68–70 Million 90–95% This is very wrong!!! This view of Iranian people is very much outdated...

0

4

0

455

Hasan_Shap retweeted

Zyphra

@ZyphraAI

4 months ago

Introducing ZUNA, a 380M-parameter BCI foundation model for EEG data, a significant milestone in the development of noninvasive thought-to-text. Fully open source, Apache 2.0.

ZyphraAI's tweet photo. Introducing ZUNA, a 380M-parameter BCI foundation model for EEG data, a significant milestone in the development of noninvasive thought-to-text.

Fully open source, Apache 2.0. https://t.co/SrVHuMNmWj

84

2K

221

1K

1M

Hassan Shapourian

@Hasan_Shap

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users