Nick Alonso @Nick__Alonso - Twitter Profile

Pinned Tweet

4 months ago

I enjoyed working on this one. If you're interested in self-attention alternatives, this might interest you. Thanks to all those @ZyphraAI who helped out.

Zyphra

@ZyphraAI

4 months ago

Today @ZyphraAI releases OVQ-attention, an advancement for efficient long-context processing! Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute. OVQ-attention is an alternative path. 🧵

ZyphraAI's tweet photo. Today @ZyphraAI releases OVQ-attention, an advancement for efficient long-context processing!

Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute.

OVQ-attention is an alternative path. 🧵 https://t.co/siTD8HTwR5

5

226

35

165

38K

0

7

0

731

Nick__Alonso retweeted

Jonathan Birch @birchlse

about 2 months ago

Computer scientists often seem incredibly confident one way or the other about computational functionalism. What they should say is that the arguments both for and against provide only inconclusive considerations and the right attitude is therefore one of great uncertainty.

42

227

29

45

55K

Nick__Alonso retweeted

Vasu Shyam @vasud3vshyam

3 months ago

@leonlufkin and @kamesh_ai really cooked with this one! https://t.co/c00NS1Pau7

0

12

1

3

330

Nick__Alonso retweeted

Zyphra

@ZyphraAI

3 months ago

@ZyphraAI releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches. We call it Hybrid Associative Memory (HAM). 🧵

ZyphraAI's tweet photo. @ZyphraAI releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches.

We call it Hybrid Associative Memory (HAM). 🧵 https://t.co/xcFq0p2VUG

5

42

15

12

8K

Nick__Alonso retweeted

samsja

@samsja19

4 months ago

Zyphra is still under the radar but doing truly innovative architecture work

0

120

7

48

12K

Nick__Alonso retweeted

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months ago

OVQ shows a practical route to handling distribution shift via online codebook learning. The universal codebook result is the theoretical side: a fixed decoder can be near optimal for any activation covariance with only a tiny rate gap, if we can actually build that codebook.

0

7

2

4

1K

Nick Alonso @Nick__Alonso

4 months ago

Nice summary.👇

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months ago

Zyphra Online Vector Quantized Attention OVQ-attention keeps linear time and constant memory but avoids long-context collapse by learning both key and value centroids online, so memory tracks the live KV stream instead of a fixed dictionary. Sparse updates route each token to a single slot, so memory capacity scales without increasing per-token compute. Based on Gaussian Mixture Regression with online EM-style updates, it outperforms VQ and linear baselines, generalizes from ~4k training context to 64k+, and stays competitive with attention using ~10–25% of the state; still early at sub-500M scale and not kernel-optimized.

gm8xx8's tweet photo. Zyphra
Online Vector Quantized Attention

OVQ-attention keeps linear time and constant memory but avoids long-context collapse by learning both key and value centroids online, so memory tracks the live KV stream instead of a fixed dictionary. Sparse updates route each token to a single slot, so memory capacity scales without increasing per-token compute. Based on Gaussian Mixture Regression with online EM-style updates, it outperforms VQ and linear baselines, generalizes from ~4k training context to 64k+, and stays competitive with attention using ~10–25% of the state; still early at sub-500M scale and not kernel-optimized.

1

44

8

29

4K

0

4

0

3

226

Nick__Alonso retweeted

Zyphra

@ZyphraAI

4 months ago

Today @ZyphraAI releases OVQ-attention, an advancement for efficient long-context processing! Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute. OVQ-attention is an alternative path. 🧵

5

226

35

165

38K

Nick__Alonso retweeted

Songlin Yang

@SonglinYang4

7 months ago

Hi @JeffDean, what’s the plan for releasing the code for this line of work? None of these papers so far seem to have released any code

SonglinYang4's tweet photo. Hi @JeffDean, what’s the plan for releasing the code for this line of work? None of these papers so far seem to have released any code https://t.co/bfI3N10bao

21

1K

40

428

250K

Nick__Alonso retweeted

Quentin Anthony

@QuentinAnthon15

7 months ago

At this point in attention-free architectures, so many people have poisoned the well that it's just a well of poison. A "Transformer Killer™" drops once a month, and then the authors come back and "kill" transformers again like 5 months later. Love the work, I'm knee-deep in a lot of it, but please for the love of god stop over-hyping. Being grounded and pointing out your own limitations gets people more excited, I promise.

QuentinAnthon15's tweet photo. At this point in attention-free architectures, so many people have poisoned the well that it's just a well of poison. A "Transformer Killer™" drops once a month, and then the authors come back and "kill" transformers again like 5 months later.

Love the work, I'm knee-deep in a lot of it, but please for the love of god stop over-hyping. Being grounded and pointing out your own limitations gets people more excited, I promise.

1

36

1

3

3K

Nick__Alonso retweeted

Zyphra

@ZyphraAI

8 months ago

@ZyphraAI is excited to release Compressed Convolutional Attention (CCA), a novel attention mechanism that: - Beats MHA, GQA, MLA for dense and MoE models - Reduces training/prefill flops - 3x fewer parameters vs MHA - Matches GQA/MLA KV-cache sizes without quality penalty

ZyphraAI's tweet photo. @ZyphraAI is excited to release Compressed Convolutional Attention (CCA), a novel attention mechanism that:
- Beats MHA, GQA, MLA for dense and MoE models
- Reduces training/prefill flops
- 3x fewer parameters vs MHA
- Matches GQA/MLA KV-cache sizes without quality penalty https://t.co/oSsRkU80P3

2

35

6

8

14K

Nick__Alonso retweeted

rishi @rishiiyer01

8 months ago

new paper https://t.co/ECv38d0HgA

9

275

31

162

60K

Nick__Alonso retweeted

Zyphra

@ZyphraAI

8 months ago

Read more at the blog post here: https://t.co/VduqOKvSN8

0

11

2

0

1K

Nick__Alonso retweeted

TensorWave @tensorwave

9 months ago

It’s not just about GPUs. It’s about the ecosystem. @QuentinAnthon15 joined @jtatarchuk on the Beyond CUDA podcast to share how moving to @AMD MI300X cut training costs at @ZyphraAI 📺 Watch the full episode on YouTube (link in comments)

2

14

4

3

1K

Nick__Alonso retweeted

rishi @rishiiyer01

11 months ago

reach out if you want to work with me and others on novel architectures for pretraining! dms are open https://t.co/SPxJltBt46

0

15

4

1

1K

Nick Alonso @Nick__Alonso

12 months ago

Learning in real time, during deployment, i.e. doing online-continual learning, effectively is important for many applications. It's also associated with theories of intelligence that emphasize learning efficiency, and is an ability where the gap between animals and AI is large.

Jack Morris

@jxmnop

12 months ago

seems big AI labs are hyperfixating on reasoning when they should focus on *memory* instead normal people won't use models that can think for hours to solve hard math problems people want models that learn over time, remember details, adapt and interact like a person would

106

1K

66

223

84K

0

7

0

1

161

Nick__Alonso retweeted

Zyphra

@ZyphraAI

about 1 year ago

Zyphra is releasing our first reasoning model, ZR1-1.5B. This small but powerful reasoning model excels at both math and code, making it one of the best models in these categories for its size. It also uses 60% less reasoning tokens than comparable models. 🆓Apache 2.0 license.

ZyphraAI's tweet photo. Zyphra is releasing our first reasoning model, ZR1-1.5B. This small but powerful reasoning model excels at both math and code, making it one of the best models in these categories for its size. It also uses 60% less reasoning tokens than comparable models.

🆓Apache 2.0 license. https://t.co/oF34EBJAHB

15

495

63

234

95K

Nick Alonso @Nick__Alonso

over 1 year ago

@petemandik Oh great! Thanks for the reference. I was unaware of this. Will be taking a look.

0

1

0

13

Nick Alonso @Nick__Alonso

over 1 year ago

Thought experiment: what should a non-conscious alien scientist conclude about human theories of consciousness? What should humans think of the alien's conclusion? In my blog(link below), I argue this scenario supports Illusionist views of consciousness. @keithfrankish @eschwitz

1

0

84

Nick Alonso @Nick__Alonso

over 1 year ago

(6/)The scenario also raises the question of how we could even get a non-conscious scientist to understand what we mean by terms like 'phenomenal character', a point which may support those who argue such terms are not meaningful enough to discuss in the first place. @petemandik

1

0

68

Nick Alonso @Nick__Alonso

over 1 year ago

(5/) If we cannot find good reasons to convince a non-conscious scientist that phenomenal consciousness and the hard problem exist, then why should humans ever believe they do?

1

0

54

Nick Alonso

@Nick__Alonso

Last Seen Users on Sotwe

Trends for you

Most Popular Users