Le Xu @happyandslow - Twitter Profile

happyandslow retweeted

Barstool Sports

@barstoolsports

about 1 month ago

Become ungovernable

448

35K

3K

4K

5M

happyandslow retweeted

Thinking Machines

@thinkymachines

about 2 months ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

465

16K

2K

12K

8M

happyandslow retweeted

Hao Zhang

@haozhangml

6 months ago

We arXiv’ed this paper a few months back, and I still find myself thinking about this work a lot: CAD arguably is a direct continuation of our previous DistServe line of work. Two after-thoughts: 1. For >4 years, training systems have been surprisingly stable... We've had Megatron/DeepSpeed (and now FSDP2) for ages, and in the “classic” pretrain regime (16- 32K context, fairly uniform batches), it’s fair to feel like the remaining wins are incremental. If you counted papers in MLSYS/OSDI, I believe # training papers have declined a lot recently. But the workload quietly changed: As agents + post-training became the main compute eater, context lengths jumped from already long to “ridiculously long”: 32K → 128K → 256K (some even start to claim 1M), and suddenly the #1 problem isn’t just parallelisms/kernels, but imbalance / stragglers. When one part of the pipeline grows ~quadratically with sequence length while most others are closer to linear, any “colocate everything on the same GPUs” design becomes a straggler source. 2. This naturally leads to the second thought: disaggregation isn’t just for serving. We’ve talked a lot about P/D disaggregation in serving (DistServe), and AFD-style ideas for MoE. Here we show the same principle applies to training: the core attention compute -- softmax(QKᵀ)V -- is (1) essentially stateless (no trainable params) and (2) surprisingly composable at token granularity with modern kernels (thanks for all kernel developers like flash attention and flash infer). That means you can treat attention less like “a layer you must shard carefully” and more like “a compute service you can schedule.” So instead of falling into the usual CP/SP rabbit hole (“what’s the perfect sharding scheme to balance this?” as when we think about TP/EP), we decouple the quadratic component, push it onto a pool of attention servers, and then shard/rebatch attention tasks *however* is convenient to equalize compute, even non-uniformly, without losing kernel efficiency. Training is throughput-sensitive (NOT latency-sensitive), so we can be aggressive with pipelining/overlap (ping-pong execution, comm/compute overlap, ) to hide all these overheads in training. I hope this work provides some new perspectives about how people should think about CP/SP and disaggregation. 😀

5

147

23

93

16K

happyandslow retweeted

vLLM

@vllm_project

about 1 year ago

🚀 Join us at the SF AIBrix & vLLM Meetup on June 18th at AWS SF GenAI Loft! Learn from experts at ByteDance, AWS Neuron, and EKS. Discover AIBrix: a scalable, cost-effective control plane for vLLM. Talks, Q&A, pizza, and networking! 🍕🤝 https://t.co/GZOmjemxJb

1

46

10

7

4K

Who to follow

Kostis Kaffes

@kkaffes

Assistant Prof. @ColumbiaCompSci working on Systems + Agents. Former Google, @Stanford. Athenian.

Ryan Huang

@ryanphuang

Associate Professor at the University of Michigan CSE. Enjoy building and researching computer systems.

Tianyin Xu

@tianyin_xu

Watchman in a cornfield @IllinoisCDS @ECEILLINOIS @ACMSIGOPS

Le Xu @happyandslow

about 1 year ago

LinkedIn entry: https://t.co/eus2X1RlGE

0

64

Le Xu @happyandslow

about 1 year ago

AIBrix v0.3.0 is officially released! Also checkout our plans for v0.4.0! We ❤️ your feedback! Blog: https://t.co/RYMCt1Phan

1

0

1K

Le Xu @happyandslow

over 1 year ago

Try out our recent work!

vLLM

@vllm_project

over 1 year ago

We are welcoming AIBrix to vLLM organization! It is a battery-included vLLM Kubernetes serving stack developed by ByteDance. https://t.co/GYeCTBBR75

vllm_project's tweet photo. We are welcoming AIBrix to vLLM organization! It is a battery-included vLLM Kubernetes serving stack developed by ByteDance.
https://t.co/GYeCTBBR75 https://t.co/WpkwyGbHjK

2

78

12

44

10K

0

3

1

0

162

Le Xu @happyandslow

almost 2 years ago

It’s great to be back!

Indranil Gupta @indygupta

almost 2 years ago

When alumna @happyandslow visited us this week my students and I went out for Peruvian dinner and then our group tradition of ice cream at Jarling’s.

indygupta's tweet photo. When alumna @happyandslow visited us this week my students and I went out for Peruvian dinner and then our group tradition of ice cream at Jarling’s. https://t.co/KkBgJsf48c

0

18

0

1K

0

4

0

377

Le Xu @happyandslow

almost 2 years ago

Yayy!

Tianyin Xu

@tianyin_xu

almost 2 years ago

A great presence of #SysNet @IllinoisCS at OSDI/ATC last week. 4 papers were presented by @IllinoisCS students and one received a Jay Lepreau Best Paper Award. It's great to see alumni like @happyandslow, Cong and Yifan who continue to engage with OSDI/ATC after they graduate.

tianyin_xu's tweet photo. A great presence of #SysNet @IllinoisCS at OSDI/ATC last week. 4 papers were presented by @IllinoisCS students and one received a Jay Lepreau Best Paper Award.

It's great to see alumni like @happyandslow, Cong and Yifan who continue to engage with OSDI/ATC after they graduate. https://t.co/VwDMOq8jUx

2

80

3

2

5K

0

7

0

945

happyandslow retweeted

Joey Gonzalez

@profjoeyg

over 2 years ago

For the technical details on how cloud computing could makes cars safer, check-out our paper: https://t.co/I3OrUvFweH Congratulations @pschafhalter, @sukritkalra, and @happyandslow!

0

6

3

0

2K

happyandslow retweeted

Joey Gonzalez

@profjoeyg

over 2 years ago

Can GPUs in the ☁️ really drive your 🚗 and make it safer? We have been studying this question and @pschafhalter will present our findings this afternoon @ieeeiros 2023. Spoiler alert: Yes! https://t.co/DPUPipst6r

0

10

3

2

3K

Le Xu @happyandslow

over 2 years ago

Full threads -- why AV should be using the cloud. Appearing this year at #IROS2023 https://t.co/l05iKmxbdy https://t.co/f6o3mKPjXw

Peter Schafhalter @pschafhalter

over 2 years ago

Self-driving cars should use the cloud.

1

10

2

2K

0

250

Le Xu @happyandslow

over 3 years ago

Nothing compares to sending cold emails to researchers you don't know 😂.

DNA&RNA Universe 𝕏 @DNA_RNA_Uni

over 3 years ago

Lablife 😅

15

3K

389

51

0

2

0

Le Xu @happyandslow

over 3 years ago

@TaliaRinger And the owner is very nice too!

0

1

0

Le Xu @happyandslow

over 3 years ago

This! Sometimes I wonder about the same thing.

Talia Ringer 🕊🪬 @TaliaRinger

over 3 years ago · Champaign

A lot of people get confused why I'd work on something and be skeptical about it at the same time, but I don't understand why anyone would not be skeptical of what they work on? It seems like a scientific obligation to be skeptical

3

128

20

1

0

2

0

Le Xu @happyandslow

almost 4 years ago

How come we don’t have lots of papers discussing negative findings?

Radoslav (Rade) Pavlović 🧪🦋🌈 @RadoslavPavlov4

almost 4 years ago

Why am I just seeing this 😂😂😂

37

5K

635

66

0

2

0

Le Xu @happyandslow

almost 4 years ago

When you realize that you’re really close to the ddl, you have nothing, and you’re still on twitter…

0

1

0

Le Xu @happyandslow

almost 4 years ago

😂

Reuben Bond @reubenbond

almost 4 years ago

I expect a "Don't Write Your Own Task Queue" article to pop up within the week

1

18

1

0

Le Xu @happyandslow

almost 4 years ago

Meanwhile, having parents who work on completely different areas and not having PhDs give you completely different mindset when you approach your problems (and career).

0

2

0