Lequn Chen @abcdabcd987 - Twitter Profile

4 days ago

RT @ruihanglai: Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup…

0

10

0

10

Lequn Chen @abcdabcd987

9 days ago

We are obsessed with performance and low level details. Great work by my teammate @xyzw_io . Chat with us if this interests you.

Perplexity

@perplexity_ai

10 days ago

We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x. Small rerankers and embedders run in single-digit milliseconds on GPU, making CPU tokenization a meaningful share of total latency. https://t.co/QUnHeiho56

perplexity_ai's tweet photo. We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x.

Small rerankers and embedders run in single-digit milliseconds on GPU, making CPU tokenization a meaningful share of total latency.

https://t.co/QUnHeiho56 https://t.co/Oh29f1lo51

64

923

103

366

123K

1

25

2

7

3K

Lequn Chen @abcdabcd987

16 days ago

💻 Code: https://t.co/cOVRSbFBZD

0

2

1

0

110

Lequn Chen @abcdabcd987

16 days ago

🎉 Presenting at #MLSys2026 today! fabric-lib: RDMA Point-to-Point Communication for LLM Systems Talk by Nandor Licker at 3:15 PM. Poster 28 this evening — come say hi! 👋

abcdabcd987's tweet photo. 🎉 Presenting at #MLSys2026 today!

fabric-lib: RDMA Point-to-Point Communication for LLM Systems

Talk by Nandor Licker at 3:15 PM. Poster 28 this evening — come say hi! 👋 https://t.co/KpOnwpkwRd

2

36

6

4

2K

Who to follow

Ligeng Zhu

@LigengZhu

Research Scientist at @Nvidia exploring efficient LLMs , previously @MIT, @SFU and @ZJU_China.

Shu Lynn Liu

@shulynnliu

CS PhD @UCBerkeley @BerkeleySky 🐻 Previously Undergrad @UWMadison 🦡 | @utnslab | @mpi_sws_ Fan of @FCBayern #MiaSanMia

Junru Shao

@junrushao

opinions are my own

Lequn Chen @abcdabcd987

16 days ago

We showcase three production applications: 🚀 KV-cache transfer for disaggregated inference. ⚡ RL weight transfer in 1.3 seconds for 1T-parameter models. 🔥 MoE dispatch/combine kernels — faster than DeepEP on ConnectX-7, fastest on EFA. 📄 Paper: https://t.co/Wech9mIqTr

1

3

1

0

160

abcdabcd987 retweeted

NVIDIA AI

@NVIDIAAI

29 days ago

Perplexity runs on NVIDIA. Nice breakdown from the team on how they’re using the CUTLASS Python stack to optimize their models for inference 👇

20

536

36

80

66K

Lequn Chen @abcdabcd987

about 1 month ago

If you are attending MLSys'26 in two weeks at Bellevue, come chat with us at our happy hour event on May 19: https://t.co/mXf7laYt1s

Perplexity

@perplexity_ai

about 1 month ago

We’ve developed our own inference engine Runtime-Optimized Serving Engine (ROSE) to serve models ranging from embeddings to trillion-parameter LLMs. With CuTeDSL integrated into our inference engine, Perplexity can build the specialized GPU kernels faster to bring models up to peak performance on NVIDIA Hopper and Blackwell GPUs.

perplexity_ai's tweet photo. We’ve developed our own inference engine Runtime-Optimized Serving Engine (ROSE) to serve models ranging from embeddings to trillion-parameter LLMs.

With CuTeDSL integrated into our inference engine, Perplexity can build the specialized GPU kernels faster to bring models up to peak performance on NVIDIA Hopper and Blackwell GPUs.

74

1K

121

353

160K

2

16

3

1

4K

Lequn Chen @abcdabcd987

about 1 month ago

@perplexity_ai Also check out our research paper accepted in #MLSys2026 fabric-lib: RDMA Point-to-Point Communication for LLM Systems https://t.co/Wech9mIqTr

0

5

1

0

343

Lequn Chen @abcdabcd987

about 1 month ago

#MLSys2026 is happening in two weeks! Our AI Infra team at @perplexity_ai is throwing a happy hour event at Bellevue on May 19. Come chat with us about inference, post-training, RL, kernels, GPUs, RDMA, agents, anything... https://t.co/mXf7laYt1s

abcdabcd987's tweet photo. #MLSys2026 is happening in two weeks! Our AI Infra team at @perplexity_ai is throwing a happy hour event at Bellevue on May 19. Come chat with us about inference, post-training, RL, kernels, GPUs, RDMA, agents, anything... https://t.co/mXf7laYt1s https://t.co/D4xdO4QgOw

1

13

3

1K

Lequn Chen @abcdabcd987

about 1 month ago

Two years at @perlexity_ai

9

148

1

4

17K

Lequn Chen @abcdabcd987

about 2 months ago

@tskaerobot @Yuchenj_UW Upload all tax documents. Prompt "prepare my 2025 tax" and your information (like location, single or married, ...). Same as what you would send to CPA. (If you don't know which docs are needed, just ask it)

1

16

1

26

2K

Lequn Chen @abcdabcd987

about 2 months ago

@iamup @AravSrinivas I uploaded all tax documents and also equity contracts. Same as what I sent to my CPA previously.

1

0

32

Lequn Chen @abcdabcd987

7 months ago

Check out my talk at Ray Summit 2025 on RDMA Point-to-Point Communication for LLM Systems https://t.co/sUGxsahnjD

0

21

1

12

5K

Lequn Chen @abcdabcd987

7 months ago

zhihu: https://t.co/jH1Jo55KIG

0

6

0

3

1K

Lequn Chen @abcdabcd987

7 months ago

Wrote a blog post on why collective communication feels awkward for newer LLM workloads (disaggregated inference, RL weight update, MoE), why people don’t just use raw RDMA, how we approached it, and some behind-the-scenes stories. https://t.co/G0IiHo54qc

4

230

30

173

22K

Lequn Chen @abcdabcd987

7 months ago

Faster than DeepEP for Decode on ConnectX-7. First viable kernel on EFA. SM-Free RDMA transfer. Support prefill. (Maybe portable to other hardware as well)

Perplexity

@perplexity_ai

7 months ago

Perplexity is the first to develop custom Mixture-of-Experts (MoE) kernels that make trillion-parameter models available with cloud platform portability. Our team has published this work on arXiv as Perplexity's first research paper. Read more: https://t.co/SNdgWTeO8F

40

713

104

220

120K

1

33

7

6

13K

Lequn Chen @abcdabcd987

8 months ago

Read more in the blog post! https://t.co/pt7w2wbV6x

0

2

0

354

Lequn Chen @abcdabcd987

8 months ago

We recently achieved 1.3-second cross-machine parameter update for Kimi-K2 (1T parameters), as opposed to a few minutes in popular frameworks.

abcdabcd987's tweet photo. We recently achieved 1.3-second cross-machine parameter update for Kimi-K2 (1T parameters), as opposed to a few minutes in popular frameworks. https://t.co/A72KjLDekF

1

5

2

4

855

Lequn Chen @abcdabcd987

8 months ago

We divide the weight transfer process into pipeline stages to enable overlapped execution over different hardware resources (CPU->GPU memcpy, GPU computation, RDMA, Ethernet).

abcdabcd987's tweet photo. We divide the weight transfer process into pipeline stages to enable overlapped execution over different hardware resources (CPU->GPU memcpy, GPU computation, RDMA, Ethernet). https://t.co/Ci3aJHeRPR

1

3

0

495

Lequn Chen

@abcdabcd987

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users