Johannes Messner @atomicflndr - Twitter Profile

19 days ago

@Dorialexander Hit who? People in the field already know and I‘m yet to see a politician who seems even close to understanding what is going on

1

3

0

203

Johannes Messner @atomicflndr

2 months ago

@samsja19 huge for us gpu poors

0

2

0

116

Johannes Messner @atomicflndr

3 months ago

we have kernels, too :)

0

2

0

41

Johannes Messner @atomicflndr

3 months ago

The last thing I worked on at Aleph Alpha! With @SohirMaskey Constantin Eichenberg @douglasahorr tl;dr: - Quantisation-aware training works really well - If you have a fixed memory budget you should probably go many parameters - few bits - k-means quant. is better than uniform

atomicflndr's tweet photo. The last thing I worked on at Aleph Alpha!
With @SohirMaskey Constantin Eichenberg @douglasahorr

tl;dr:
- Quantisation-aware training works really well
- If you have a fixed memory budget you should probably go many parameters - few bits
- k-means quant. is better than uniform https://t.co/2xd8k19Una

Graphcore Research

@GCResearchTeam

3 months ago

Would you rather use 1 million × 16-bit weights, 4 million × 4-bit weights, or even 16 million × 1-bit weights? In joint work between Aleph Alpha Research and Graphcore, we asked this question of LLMs — the answer encouraged us to embrace the wonder ✨ of 1-bit weights, which can outperform 4-bit and 16-bit weights on a fixed weight memory budget. In our work - ⚖️ A scaling laws evaluation prompts us to consider very low-bit formats - 📈 Scaled-up tests show the power of memory-matched models with 1-bit weights - ⚡ Kernel benchmarking demonstrates their feasibility for autoregressive inference Read all about it in our blog and paper (link below! ⬇️) Massive thanks to our collaborators at Aleph Alpha Research! Authors: @SohirMaskey, Constantin Eichenberg, @atomicflndr and @douglasahorr

GCResearchTeam's tweet photo. Would you rather use 1 million × 16-bit weights, 4 million × 4-bit weights, or even 16 million × 1-bit weights?

In joint work between Aleph Alpha Research and Graphcore, we asked this question of LLMs — the answer encouraged us to embrace the wonder ✨ of 1-bit weights, which can outperform 4-bit and 16-bit weights on a fixed weight memory budget.

In our work
- ⚖️ A scaling laws evaluation prompts us to consider very low-bit formats
- 📈 Scaled-up tests show the power of memory-matched models with 1-bit weights
- ⚡ Kernel benchmarking demonstrates their feasibility for autoregressive inference

Read all about it in our blog and paper (link below! ⬇️)

Massive thanks to our collaborators at Aleph Alpha Research!

Authors: @SohirMaskey, Constantin Eichenberg, @atomicflndr and @douglasahorr

1

20

3

13

1K

1

6

0

1

325

Who to follow

Johannes Messner @atomicflndr

3 months ago

... and scale up to larger models and generative evals to confirm this trend!

1

0

57

Johannes Messner @atomicflndr

4 months ago

@NairoInGreen you are talking about an olympic champion sir

0

1

0

720

atomicflndr retweeted

samsja

@samsja19

4 months ago

Today we’re releasing Trinity Large, a 400B MoE LLM with 13B active parameters, trained over 17T tokens The base model is on par with GLM-4.5 Base, while being significantly faster at inference because it’s sparser and hybrid The architecture we picked is one of my favorites: 3:1 local/global with SWA, NoPE on the global layers and RoPE on the local layers, gated attention, depth-scaled sandwich norm, and smooth training with Muon. Our dataset is also high quality, curated by @datologyai . We trained it on 2,000 B300s for a month on @PrimeIntellect infrastructure. This is a preview release with an instruct model only — we’re ramping up RL on it. When @latkins approached us a couple of months ago to train this model together, I thought he was crazy — but then he hired @stochasticchasm, and here we are.

24

571

44

173

71K

Johannes Messner @atomicflndr

6 months ago

@samsja19 @rasdani_ Where is the pirate hat?

0

1

0

30

Johannes Messner @atomicflndr

8 months ago

@nanwang_t wish you all the best for whatever is next!

0

1

0

29

Johannes Messner @atomicflndr

8 months ago

@tugot17 You could apply the same logic to handbags, and yet…

0

38

Johannes Messner @atomicflndr

9 months ago

@NWalhan @KuittinenPetri @Aleph__Alpha tech report is coming soon, but the positional encoding is standard rope (for all sub-transformers). I don't have the loss curve of these particular checkpoints at hand right now, but I can show you a cpt curve from a different HAT model i'm currently training; it's quite boring

atomicflndr's tweet photo. @NWalhan @KuittinenPetri @Aleph__Alpha tech report is coming soon, but the positional encoding is standard rope (for all sub-transformers). I don't have the loss curve of these particular checkpoints at hand right now, but I can show you a cpt curve from a different HAT model i'm currently training; it's quite boring https://t.co/5wqSMOjmmt

1

2

0

76

atomicflndr retweeted

Vedant Nanda @_nvedant_

9 months ago

Curious how to accelerate inference of some of the recent byte level models like HAT/HNet/BLT? Check out this vllm fork developed by my friends and colleagues, Pablo and Lukas! To my knowledge first demonstration of inference speedups from dynamic chunking in byte models!

1

8

2

1

756

atomicflndr retweeted

Pablo Iyu Guerrero @pabloiyu

9 months ago

First high-performance inference for hierarchical byte models. @LukasBluebaum and I developed batched inference for tokenizer-free HAT (Hierarchical Autoregressive Transformers) models, developed by @Aleph__Alpha Research. In some settings, we outcompete the baseline Llama.🧵

pabloiyu's tweet photo. First high-performance inference for hierarchical byte models.
@LukasBluebaum and I developed batched inference for tokenizer-free HAT (Hierarchical Autoregressive Transformers) models, developed by @Aleph__Alpha Research. In some settings, we outcompete the baseline Llama.🧵 https://t.co/tLq7z2kAEv

2

27

7

5

4K

Johannes Messner @atomicflndr

9 months ago

Tears in my eyes

EU–INC @euinc_petition

9 months ago

🤯 MERZ AND MACRON JUST CONFIRMED A PAN-EUROPEAN LEGAL ENTITY IS COMING It's now up to all of us to ensure the solution that gets passed into law is fit for purpose for European startups. That means: EU–INC. 🚀 Support us and we all will get this done together. 🇪🇺🤝

euinc_petition's tweet photo. 🤯 MERZ AND MACRON JUST CONFIRMED A PAN-EUROPEAN LEGAL ENTITY IS COMING

It's now up to all of us to ensure the solution that gets passed into law is fit for purpose for European startups.

That means: EU–INC. 🚀

Support us and we all will get this done together. 🇪🇺🤝 https://t.co/D7wVUTPaDZ

15

597

102

68

91K

0

5

0

191

atomicflndr retweeted

Vedant Nanda @_nvedant_

10 months ago

Our work on tokenizer free LLMs: Hierarchical Autoregressive Transformers (HAT)! We recently dropped HAT models on HF, pretrained from scratch! https://t.co/S1AavLetGN You can try them with both HF Inference AND our vllm fork: https://t.co/HOPFj3vQv9 🧵 (1/6)

2

21

2

1K

atomicflndr retweeted

Vedant Nanda @_nvedant_

10 months ago

And imo what's the coolest is that we made it ready for production grade inference with our own vllm fork (more details on this soon!): https://t.co/HOPFj3vQv9 So now you can now enjoy all the vllm features like continuous batching, paged attention etc also for HAT! (5/6)

1

4

1

0

193

Johannes Messner @atomicflndr

10 months ago

@tulkenss Hmm I don’t think so, you‘d have to do some model surgery to extract it. @PitNeitemeier correct me if I’m wrong?

0

1

0

34

Johannes Messner @atomicflndr

10 months ago

Seeing this pushback a lot - and it‘s fair! However, these models don’t have a fixed vocabulary, i.e. there are infinitely many words the model can operate over instead of a finite set of tokens.

slm tokens @tulkenss

10 months ago

I wouldn't really consider these to be tokenizer-free tbh. Unlike Hnets, these models are word level. The sequence is turned into words (this is literally called tokenization). Then, the bytes of these words are turned into embeddings, which are then processed by a model.

4

45

2

11

5K

1

11

0

1

1K

Johannes Messner @atomicflndr

10 months ago

@tulkenss Fair :)

1

2

0

32

Johannes Messner

@atomicflndr

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users