Jonathan Pilault @J_Pilault - Twitter Profile

1 day ago

This Week in Inference: Anthropic's Claude Fable 5 drops at $50/M output tokens, making cost-efficient routing the only rational response. Plus 3 more stories. Thread

6

2

0

42

J_Pilault retweeted

Prizmal

@PrizmalAi

8 days ago

This Week in Inference: an enterprise burned $500M on Claude tokens in 30 days because nobody set a usage cap. Agentic workflows cost up to 1000x more tokens than basic chat. Plus 4 more stories. Thread

7

3

1

0

54

Jonathan Pilault @J_Pilault

over 1 year ago

I am extremely proud of what the team at @ZyphraAI has achieved. Let's keep pushing the boundaries!

Quentin Anthony

@QuentinAnthon15

over 1 year ago

For a long time our training goals @ZyphraAI had been to just match dense transformers, but with faster inference and lower training cost. Today we also surpass them with Zamba2-7B.

2

52

9

6

7K

0

6

0

468

J_Pilault retweeted

Nick Alonso @Nick__Alonso

almost 2 years ago

1) RAG often struggles on complex multi-hop queries. In this blog, we @ZyphraAI discuss and build a graph-based RAG system which tops the leaderboard on a QA benchmark with multi-hop queries and outperforms frontier long-context models for 60x less cost. https://t.co/QDXUdiWzh5

Nick__Alonso's tweet photo. 1) RAG often struggles on complex multi-hop queries. In this blog, we @ZyphraAI discuss and build a graph-based RAG system which tops the leaderboard on a QA benchmark with multi-hop queries and outperforms frontier long-context models for 60x less cost.

https://t.co/QDXUdiWzh5 https://t.co/AvSjtihB6f

1

12

4

6

1K

Who to follow

Deva Ramanan

@RamananDeva

Professor at Carnegie Mellon University

Evan Ntavelis

@VanJellyfish

Editing and Synthesizing Images with Deep Learning // PhD Student @CVL_ETH, Research Intern @Apple Zurich

Pierre Potvin

@pierrejrpotvin

Entrepreneur

J_Pilault retweeted

Vasu Shyam @vasud3vshyam

almost 2 years ago

@ylecun Thanks for sharing! Another little trick that might amuse you is that we identified a function which upon minimization produces the forward pass of the attention block:

vasud3vshyam's tweet photo. @ylecun Thanks for sharing! Another little trick that might amuse you is that we identified a function which upon minimization produces the forward pass of the attention block: https://t.co/i716T3zgz9

0

24

2

10

2K

Jonathan Pilault @J_Pilault

almost 2 years ago

Thank you to my wonderful teammates @vasud3vshyam, @nshepperd1, @BerenMillidge, @QuentinAnthon15

0

6

0

601

Jonathan Pilault @J_Pilault

almost 2 years ago

Zyphra is proud to release Tree Attention, a fast inference method for extremely large sequence lengths • 8x faster inference speed vs. Ring Attention • 2x less peak memory • low data communication volumes Paper: https://t.co/yf5VNRze6W Code: https://t.co/Th6Fg8eFEr A 🧵

J_Pilault's tweet photo. Zyphra is proud to release Tree Attention, a fast inference method for extremely large sequence lengths
• 8x faster inference speed vs. Ring Attention
• 2x less peak memory
• low data communication volumes
Paper: https://t.co/yf5VNRze6W
Code: https://t.co/Th6Fg8eFEr
A 🧵 https://t.co/ZyZgK0OC5J

1

149

31

95

30K

Jonathan Pilault @J_Pilault

almost 2 years ago

By using the two-level interconnect topology on GPU clusters, Tree Attention allows for asymptotically faster decoding as we scale output sequence length and number of GPUs in a cluster and lower peak memory requirements:

J_Pilault's tweet photo. By using the two-level interconnect topology on GPU clusters, Tree Attention allows for asymptotically faster decoding as we scale output sequence length and number of GPUs in a cluster and lower peak memory requirements: https://t.co/lkl6ot4Bm4

1

6

0

824

J_Pilault retweeted

Quentin Anthony

@QuentinAnthon15

almost 2 years ago

Zyphra is ecstatic to release Zamba2-small: - 2.7B Mamba2/Attention hybrid - Pre-trained on 3T tokens + annealed on 100B high-quality tokens - Model released on HuggingFace and standalone PyTorch - SOTA evaluation performance and superior inference efficiency. https://t.co/uToHut7FPB https://t.co/MN7cwNcQzc https://t.co/Tdz0BjgMGV

QuentinAnthon15's tweet photo. Zyphra is ecstatic to release Zamba2-small:
- 2.7B Mamba2/Attention hybrid
- Pre-trained on 3T tokens + annealed on 100B high-quality tokens
- Model released on HuggingFace and standalone PyTorch
- SOTA evaluation performance and superior inference efficiency.
https://t.co/uToHut7FPB
https://t.co/MN7cwNcQzc
https://t.co/Tdz0BjgMGV

4

202

43

94

36K

J_Pilault retweeted

utku @utkuevci

about 3 years ago

Hyped to share JaxPruner: a concise library for sparsity research. JaxPruner includes 10+ easy-to-modify baseline algorithms and provides integration with popular libraries like t5x, scenic, dopamine and fedjax. 1/7 Code: https://t.co/tPwCL03xnE Paper: https://t.co/eedLJj5EVW

utkuevci's tweet photo. Hyped to share JaxPruner: a concise library for sparsity research.

JaxPruner includes 10+ easy-to-modify baseline algorithms and provides integration with popular libraries like t5x, scenic, dopamine and fedjax. 1/7

Code: https://t.co/tPwCL03xnE
Paper: https://t.co/eedLJj5EVW https://t.co/6h8UsqGpqu

1

146

30

40

38K

J_Pilault retweeted

Quentin Anthony

@QuentinAnthon15

about 2 years ago

Zyphra is pleased to announce Zamba-7B: - 7B Mamba/Attention hybrid - Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens - Outperforms Llama-2 7B and OLMo-7B - All checkpoints across training to be released (Apache 2.0) - Achieved by 7 people, on 128 H100 GPUs, in 30 days - https://t.co/eOhhSNGJDc - https://t.co/1horXuTOj0 Want more details? A 🧵

QuentinAnthon15's tweet photo. Zyphra is pleased to announce Zamba-7B:
- 7B Mamba/Attention hybrid
- Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens
- Outperforms Llama-2 7B and OLMo-7B
- All checkpoints across training to be released (Apache 2.0)
- Achieved by 7 people, on 128 H100 GPUs, in 30 days
- https://t.co/eOhhSNGJDc
- https://t.co/1horXuTOj0

Want more details? A 🧵

18

422

81

227

185K

J_Pilault retweeted

Ross Goroshin @RGoroshin

about 2 years ago

Last week, I gave a talk at @Mila_Quebec. The talk should be of interest to anyone working on predictive models, particularly in latent space. In collab. with @MahanFathi @ClementGehring @J_Pilault @davidkanaa @pierrelux. See you at @iclr_conf in 🇦🇹! https://t.co/vFBtHDzNju

0

18

5

9

5K

J_Pilault retweeted

Mahan Fathi @MahanFathi

over 2 years ago

Course Correcting Koopman Representations Accepted at #ICLR2024! We identify problems with unrolling in imagination and propose an unconventional, simple, yet effective solution: periodically "𝒓𝒆𝒆𝒏𝒄𝒐𝒅𝒊𝒏𝒈" the latent. 📄 https://t.co/ULNzqAV3bB @GoogleDeepMind 1/🧵

MahanFathi's tweet photo. Course Correcting Koopman Representations
Accepted at #ICLR2024!

We identify problems with unrolling in imagination and propose an unconventional, simple, yet effective solution: periodically "𝒓𝒆𝒆𝒏𝒄𝒐𝒅𝒊𝒏𝒈" the latent.

📄 https://t.co/ULNzqAV3bB
@GoogleDeepMind

1/🧵 https://t.co/4g4WORYgcC

4

92

19

53

18K

J_Pilault retweeted

David Krueger 🦥 ⏸️ ⏹️ ⏪

@DavidSKrueger

over 2 years ago

My research group @kasl_ai is looking for interns! Applications are due in 2 weeks ***January 29***. The long-awaited form: https://t.co/hLOjuxSfnK Please share widely!!

5

274

74

271

44K

J_Pilault retweeted

Richard Socher

@RichardSocher

over 2 years ago

I like the SSM/hyena/Block State Transformers https://t.co/SDAT6V5mXB https://t.co/yIODAIHGlM They remind me of Q-RNNs https://t.co/H2I48lapT7 and play around with different parallelization ideas. I don't think transformers are that special and there are many equivalent architectures.

1

26

3

14

3K

Jonathan Pilault @J_Pilault

over 2 years ago

@MahanFathi Excited to present our work (Block-State Transformers) at #NeurIPS2023 Great Hall & Hall B1+B2 (level 1) no. 817. Please join us in the poster session between 5PM and 7PM CST today!

0

1

0

145

J_Pilault retweeted

Mahan Fathi @MahanFathi

over 2 years ago

Why not get the best of both worlds by combining SSMs and Transformers? Excited to share our work at #NeurIPS2023: "Block-State Transformers." BST hits new highs in long-range language modeling and LRA tasks. paper: https://t.co/nHt6OGyez1 1/

MahanFathi's tweet photo. Why not get the best of both worlds by combining SSMs and Transformers?

Excited to share our work at #NeurIPS2023: "Block-State Transformers."

BST hits new highs in long-range language modeling and LRA tasks.

paper: https://t.co/nHt6OGyez1

1/ https://t.co/TCa3wWkq5K

8

375

64

269

91K

Jonathan Pilault @J_Pilault

almost 14 years ago

Tips for non-technical entrepreneurs http://t.co/9lmKKTLx

0

2

0

Jonathan Pilault

@J_Pilault

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users