Jiangfei Duan

over 1 year ago

@JiaZhihao Congratulations!!!

0

47

jiangfeiduan retweeted

Assistant Professor, ECE @ UC Davis, Unsupervised Learning, NeuroAI Co-Founder at https://t.co/d0GT9i7fAT

over 1 year ago

🎥 Videos DiTs are painfully slow, HunyuanVideo takes 16 min to generate a 5s 720P video on H100. 🤯 Announcing Sliding Tile Attention (STA): * Accelerate 3D full attention (FA3) by up to 10x * Slash the end-to-end time from 16 --> 5 mins * NO extra training. NO quality loss! 🚀 Can you tell which videos are generated by the original HunyuanVideo, and which by STA? 👀 Blog: https://t.co/5kwzENjHjk

6

244

55

126

58K

Who to follow

Yubei Chen

@Yubei_Chen

jiangfeiduan retweeted

over 1 year ago

🎥 Frustrated by Sora's credit limits? Still waiting for Veo 2? 🚀 Open-source video DiTs are actually on par. We introduce FastVideo, an open-source stack to support fast video generation for SoTA open models. We have supported Mochi and Hunyuan, 8x faster inference, 720P 5-second video in 62 seconds.

9

283

61

170

70K

jiangfeiduan retweeted

over 1 year ago

We are excited to share works from our amazing lab members and collaborators at #NeurIPS2024! 💡✨ Come and discuss our latest research about LLM serving scheduling, training and inference with emerging architectures, and more! 1️⃣ Poster: Efficient LLM Scheduling by Learning to Rank 📍 Location & Time: Fri 11am@East Exhibit Hall A-C #2608 🧑‍🎓 Leads: @FuYichao123 📜 TL;DR: LLM-LTR is an efficient LLM serving system that reduces latency by approximating Shortest Job First (SJF) scheduling through learning-to-rank techniques. 2️⃣ Poster: Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length 📍 Location & Time: Wed 4:30pm@East Exhibit Hall A-C #2002 🧑‍🎓 Leads: @MaxMa1987, @_xiaomengy_, @violet_zct 📜 TL;DR: Megalodon is a pre-trained model that employs a novel neural architecture with better long-sequence modeling capability and inference-time efficiency.

0

13

3

2

1K

over 1 year ago

@runsen_xu @AIatMeta towards success

0

87

almost 2 years ago

@junting9 🐮

0

56

almost 2 years ago

Check our posters at #ICML2024!

almost 2 years ago

We are excited to announce our lab's papers at #ICML2024! 🧠✨ Come and discuss our latest research from LLM evaluation to efficient LLM serving & inference! See you there! 1️⃣ Poster: MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving 📍 Location & Time: poster session 1 — Hall C 4-9 #816, 11:30 AM on Tuesday July 23 📜 TL;DR: MuxServe Boosts multiple LLM serving throughput by up to 1.8x through flexible spatial-temporal multiplexing. 2️⃣ Poster: Break the Sequential Dependency of LLM Inference Using Lookahead Decoding 📍 Location & Time: poster session 2 — Hall C 4-9 #411, 1:30 PM on Tuesday July 23 📜 TL;DR: An exact and parallel decoding algorithm that accelerates LLM decoding without needing auxiliary models or data stores. 3️⃣ Poster: Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference 📍 Location & Time: poster session 3 — Hall C 4-9 #709, 11:30 AM on Wednesday July 24 📜 TL;DR: Chatbot Arena is an open platform for evaluating LLMs based on human preferences through crowdsourced pairwise comparisons, and it’s becoming a widely cited leaderboard for its robust and credible evaluation methods. 4️⃣ Poster: CLLMs: Consistency Large Language Models 📍 Location & Time: poster session 4 — Hall C 4-9 #604, 1:30 PM on Wednesday July 24 📜 TL;DR: We introduce a new family of LLMs optimized for fast Jacobi decoding, achieving a 2.4x to 3.4x improvement in generation speed across multiple benchmarks without compromising quality. 5️⃣ Poster: Online Speculative Decoding 📍 Location & Time: poster session 5 — Hall C 4-9 #605, 11:30 AM on Thursday July 25 📜 TL;DR: OSD improves the efficiency of large language model inference by continuously updating the draft models with user query data, resulting in a significant reduction in latency and an increase in token acceptance rates. 6️⃣ Poster: InferCept: Efficient Intercept Support for Augmented Large Language Model Inference 📍 Location & Time: poster session 5 — Hall C 4-9 #709, 11:30 AM on Thursday July 25 📜 TL;DR: InferCept is the first inference framework for augmented LLMs, efficiently serving LLMs that can query tools, ML models, and virtual environments.

1

18

4

7

8K

2

7

0

342

jiangfeiduan retweeted

Zhihao Jia

@JiaZhihao

almost 2 years ago

#ICML2024 Join us for a 2-hour tutorial on Monday, July 22, focusing on advanced algorithms and systems for efficient LLM serving. The session will include our recent research on: ✨ Mirage: Auto-gen performant GPU kernels for LLMs 💸 SpotServe: Cost-effective LLMs on spot instances 🌳 SpecInfer: Tree-based speculative decoding techniques 🔧 FlexLLM: Co-serving LLM inference & finetuning

JiaZhihao's tweet photo. #ICML2024 Join us for a 2-hour tutorial on Monday, July 22, focusing on advanced algorithms and systems for efficient LLM serving. The session will include our recent research on:
✨ Mirage: Auto-gen performant GPU kernels for LLMs
💸 SpotServe: Cost-effective LLMs on spot instances
🌳 SpecInfer: Tree-based speculative decoding techniques
🔧 FlexLLM: Co-serving LLM inference & finetuning

2

111

17

38

11K

about 2 years ago

MuxSeve will appear at ICML '24: Maximize GPU utilization in LLM Serving with spatial-temporal multiplexing. Thanks all our amazing collaborators!

about 2 years ago

Multiple LLM serving has emerged as a crucial and costly demand. Want to co-serve multiple LLMs with better utilization? Introducing MuxServe - flexible spatial-temporal multiplexing - up to 1.8x higher throughput Blog: https://t.co/Pep94vUFTw Paper: https://t.co/X1Jhov3QOY

3

78

21

40

15K

1

12

1

0

1K

jiangfeiduan retweeted

about 2 years ago

Multiple LLM serving has emerged as a crucial and costly demand. Want to co-serve multiple LLMs with better utilization? Introducing MuxServe - flexible spatial-temporal multiplexing - up to 1.8x higher throughput Blog: https://t.co/Pep94vUFTw Paper: https://t.co/X1Jhov3QOY

3

78

21

40

15K