Sam Ade Jacobs

over 1 year ago

🚀Introducing Ulysses-Offload🚀 - Unlock the power of long context LLM training and finetuning with our latest system optimizations - Train LLaMA3-8B on 2M tokens context using 4xA100-80GB - Achieve over 55% MFU Blog: https://t.co/AoGSqyKb1E Tutorial: https://t.co/6YHSA5iOop

DeepSpeedAI's tweet photo. 🚀Introducing Ulysses-Offload🚀

- Unlock the power of long context LLM training and finetuning with our latest system optimizations
- Train LLaMA3-8B on 2M tokens context using 4xA100-80GB
- Achieve over 55% MFU

Blog: https://t.co/AoGSqyKb1E
Tutorial: https://t.co/6YHSA5iOop https://t.co/CCNRB1u5He

1

97

29

42

6K

samadejacobs retweeted

almost 2 years ago

Great to see the amazing DeepSpeed optimizations from @Guanhua_Wang_, Heyang Qin, @toh_tana, @QuentinAnthon15, and @samadejacobs presented by @ammar_awan at MUG '24.

0

9

4

1

2K

samadejacobs retweeted

Product RD @NVIDIA, HPC and AI at scale - opinions and content are my own, and do not reflect current or former employers

almost 2 years ago

Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations. - HF Inference & Finetuning - LoRA - CPU Offload Blog: https://t.co/LeNHlDZH3C

DeepSpeedAI's tweet photo. Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations.
- HF Inference & Finetuning
- LoRA
- CPU Offload

Blog: https://t.co/LeNHlDZH3C https://t.co/dPxUelVib2

1

37

6

9

4K

Who to follow

John ✈️

@chaostheoremhpc

Deanna Willis

@DeannaKWillis

Spel chekers, hoo neeeds em? Communications @Livermore_Lab. Wife. Mom to two boys. Not in order of priority.

samadejacobs retweeted

almost 2 years ago

Introducing Universal Checkpointing for boosting training efficiency. - Change parallelism (PP, SP, TP, ZeRO-DP) or GPU count mid-stream - Improve resilience by scaling down to healthy nodes💪 - Increase throughput by scaling up to elastic nodes🚀 Blog: https://t.co/qL32e5i1D2

DeepSpeedAI's tweet photo. Introducing Universal Checkpointing for boosting training efficiency.
- Change parallelism (PP, SP, TP, ZeRO-DP) or GPU count mid-stream
- Improve resilience by scaling down to healthy nodes💪
- Increase throughput by scaling up to elastic nodes🚀

Blog: https://t.co/qL32e5i1D2 https://t.co/37ZdWwoJhG

0

23

5

11

4K

samadejacobs retweeted

Jeff Dean

@JeffDean

over 2 years ago

A nice example of the kind of capabilities unlocked by the long context feature in the Gemini 1.5 Pro model.

24

435

46

72

99K

over 2 years ago

@sama @_tim_brooks @billpeeb @model_mechanic Sora video of what Lagos would look like in 2056 is my favorite… incredibly awesome! Cc: @AjuriNgelale , @bosuntijani

0

25

samadejacobs retweeted

Stas Bekman

@StasBekman

over 2 years ago

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should work well now: https://t.co/SOlRIRqOB6 ZeRO++'s main feature is allowing you to use a hybrid approach if you can fit a model on a single node of 8 gpus. So it takes benefit of the super fast NVLink within the node and only needs to reduce grads across nodes over the slow link. So if in your workflow the slow inter-node network was impacting your tflops, enabling ZeRO++ should give you a sizeable boost. The number would very depend on your situation but in my experiments I saw 5%+ boost with a 7b llama. This is similar to Hybrid FSDP. To try see: https://t.co/WdU8U5tjuX I was talking about the hybrid solution - I'm yet to try the quantized weights/grads also offered by ZeRO++ which should speed up things even further as there will be even less stress on the network with those. Just remember until the next release is made you want deepspeed@master

3

77

12

25

8K

samadejacobs retweeted

over 2 years ago

Introducing Mixtral, Phi2, Falcon, and Qwen support in #DeepSpeed-FastGen! - Up to 2.5x faster LLM inference - Optimized SplitFuse and token sampling - Exciting new features like RESTful API and more! For more details: https://t.co/386OvJtQLk #DeepSpeeed #AI

DeepSpeedAI's tweet photo. Introducing Mixtral, Phi2, Falcon, and Qwen support in #DeepSpeed-FastGen!

- Up to 2.5x faster LLM inference
- Optimized SplitFuse and token sampling
- Exciting new features like RESTful API and more!

For more details: https://t.co/386OvJtQLk

#DeepSpeeed #AI https://t.co/RpjEQ6zWnj

9

414

88

233

50K

samadejacobs retweeted

over 2 years ago

🚀 Excited to announce our paper "ZeRO++: Extremely Efficient Collective Communication for Large Model Training" has been accepted at #ICLR2024! 🔍 ZeRO++ significantly reduces communication volume by 4x, achieving up to 3.3x speedup. https://t.co/a9OS4rD0rN #DeepSpeed #AI

2

92

20

32

6K

samadejacobs retweeted

OpenAI

@OpenAI

over 2 years ago

We're rolling out new features and improvements that developers have been asking for: 1. Our new model GPT-4 Turbo supports 128K context and has fresher knowledge than GPT-4. Its input and output tokens are respectively 3× and 2× less expensive than GPT-4. It’s available now to all developers in preview. 2. Assistants API and new tools (Retrieval, Code Interpreter) will help developers build world-class AI assistants within their own apps. 3. The platform is becoming multimodal. GPT-4 Turbo with Vision, DALL·E 3, and text-to-speech are all now available to developers. Oh… and we’re doubling GPT-4 rate limits. https://t.co/BMnsBAHorI

888

14K

3K

2K

4M

samadejacobs retweeted

over 2 years ago

Introducing DeepSpeed-FastGen 🚀 Serve LLMs and generative AI models with - 2.3x higher throughput - 2x lower average latency - 4x lower tail latency w. Dynamic SplitFuse batching Auto TP, load balancing w. perfect linear scaling, plus easy-to-use API https://t.co/iizM71bjqj

DeepSpeedAI's tweet photo. Introducing DeepSpeed-FastGen 🚀

Serve LLMs and generative AI models with
- 2.3x higher throughput
- 2x lower average latency
- 4x lower tail latency
w. Dynamic SplitFuse batching

Auto TP, load balancing w. perfect linear scaling, plus easy-to-use API

https://t.co/iizM71bjqj https://t.co/x2mDwzBJK7

6

546

115

267

113K

samadejacobs retweeted

over 2 years ago

🚀Introducing #DeepSpeed-VisualChat! 🖼📜 - Multi-image, multi-round #dialogues - Novel #MultiModal causal attention - Enriched training data via improved blending techniques - Unmatched #scalability (>70B params) Blog: https://t.co/yAlqEjBI8c Paper: https://t.co/CrxCW5EXzF

DeepSpeedAI's tweet photo. 🚀Introducing #DeepSpeed-VisualChat! 🖼📜

- Multi-image, multi-round #dialogues

- Novel #MultiModal causal attention

- Enriched training data via improved blending techniques

- Unmatched #scalability (>70B params)

Blog: https://t.co/yAlqEjBI8c

Paper: https://t.co/CrxCW5EXzF https://t.co/h9gBQScu6y

1

134

38

49

19K

samadejacobs retweeted

over 2 years ago

🚀Exciting new updates on #DeepSpeed ZeRO-Inference with 20X faster generation! - 4x lesser memory usage through 4-bit weight quantization with no code change needed. - 4x larger batch sizes through KV cache offloading. Available in DeepSpeed v0.10.3: https://t.co/v24qV42rWC

DeepSpeedAI's tweet photo. 🚀Exciting new updates on #DeepSpeed ZeRO-Inference with 20X faster generation!

- 4x lesser memory usage through 4-bit weight quantization with no code change needed.

- 4x larger batch sizes through KV cache offloading.

Available in DeepSpeed v0.10.3: https://t.co/v24qV42rWC https://t.co/82s7tv7fEv

2

166

28

36

18K

samadejacobs retweeted

Eric Horvitz

@erichorvitz

over 2 years ago

We have much to learn about LLMs. Compact 1.3 billion parameter phi-1.5 model exhibits surprising capabilities. @MSFTResearch

0

19

4

2

5K

over 2 years ago

@jteevan Congratulations @jteevan and thank you for your leadership.

0

1

0

23

samadejacobs retweeted

almost 3 years ago

Want to train 1 million token context lengths (all 7 of the Harry Potter books!📚) on a GPT-like model w. 64 GPUs? Announcing DeepSpeed-Ulysses🚀 This release enables highly efficient and scalable LLM training with extremely long sequence lengths🤯 https://t.co/byWeQeeTed

DeepSpeedAI's tweet photo. Want to train 1 million token context lengths (all 7 of the Harry Potter books!📚) on a GPT-like model w. 64 GPUs?

Announcing DeepSpeed-Ulysses🚀

This release enables highly efficient and scalable LLM training with extremely long sequence lengths🤯

https://t.co/byWeQeeTed https://t.co/lbhM3ZIYjU

1

139

40

48

16K

samadejacobs retweeted

OpenAI

@OpenAI

about 3 years ago

We trained an AI using process supervision — rewarding the thought process rather than the outcome — to achieve new state-of-art in mathematical reasoning. Encouraging sign for alignment of advanced AIs: …https://t.co/ryaODghohn

408

4K

786

657

2M

about 3 years ago

@Livermore_Lab @bkspears9 Sorry I missed this, Nigerian presidential election polluted my Twitter feed! Good job @bkspears9 and the NIF team, hope you win!

1

0

41