Reuben Stern @ReubenConducts - Twitter Profile

1 day ago

I've defended and graduated! Perhaps the most important lesson I've learned during my time at MIT is that progress in science (and in society!) is deeply collective. In today's world --- and especially in a hyper-competitive field like AI research --- it's easy to get sucked into comparison and self-doubt. Much of this, I think, comes from a misunderstanding of how scientific progress actually works: we tend to attribute oversized credit to a small number of figures. But certainly none of the work I've done, and none of the growth I've undergone, would have been possible without the support of my mentors, collaborators, and the insights of millions of brilliant scientists before me. Along these lines, I am grateful to the amazing community around me who have supported my journey: most importantly, to my advisor @jacobandreas, the dozens of collaborators I've worked with during my PhD, my labmates, my mentees, and my co-organizers at @MITGradUnion --- all of whom have shown me, in various ways, what it means to work not out of comparison but out of love: for science, for the community around me, and for humanity. I hope to carry forward these values wherever I go.

belindazli's tweet photo. I've defended and graduated!

Perhaps the most important lesson I've learned during my time at MIT is that progress in science (and in society!) is deeply collective. In today's world --- and especially in a hyper-competitive field like AI research --- it's easy to get sucked into comparison and self-doubt. Much of this, I think, comes from a misunderstanding of how scientific progress actually works: we tend to attribute oversized credit to a small number of figures. But certainly none of the work I've done, and none of the growth I've undergone, would have been possible without the support of my mentors, collaborators, and the insights of millions of brilliant scientists before me.

Along these lines, I am grateful to the amazing community around me who have supported my journey: most importantly, to my advisor @jacobandreas, the dozens of collaborators I've worked with during my PhD, my labmates, my mentees, and my co-organizers at @MITGradUnion --- all of whom have shown me, in various ways, what it means to work not out of comparison but out of love: for science, for the community around me, and for humanity.

I hope to carry forward these values wherever I go.

35

473

21

44

35K

Reuben Stern @ReubenConducts

17 days ago

@henrylhtsang correct, not in the critical path for the mma.

0

1

0

11

Reuben Stern @ReubenConducts

22 days ago

@gpusteve This is what grouped GEMM kernels are for :) Only two GEMM kernel launches per rank (in forward). Choose the dtype that makes sense for you numerically and then write the kernels that will make it fast.

1

2

0

35

Reuben Stern @ReubenConducts

24 days ago

We have enjoyed collaborating with @thinkymachines on some of the attention backend that supports this impressive work. Congrats to everyone involved!

Thinking Machines

@thinkymachines

25 days ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

462

16K

2K

12K

8M

0

3

0

75

Who to follow

Sean Finamore

@seanfinamore

continuous science @medra_ai • @harvard

Laura Balboni Craciun

@laurabalboniart

Violist/ Fine Artist, SMI advocate and peer

New England Musicians Resource Fund

@NEMusiciansFund

Founded in 2020, NEMRF provides financial assistance and resources for professional freelance musicians in New England.

Reuben Stern @ReubenConducts

25 days ago

@Simon_Vt Thanks, Simon!

2

0

200

Reuben Stern @ReubenConducts

25 days ago

We have a new blog about cluster launch control (CLC) on NVIDIA Blackwell GPUs! CLC is a powerful tool for dynamically scheduling work across the GPU, both within- and accross-kernels. https://t.co/5eNykPOBFb

1

53

8

28

3K

Reuben Stern @ReubenConducts

about 1 month ago

All my homies love CLC

driss guessous @drisspg

about 1 month ago

I alluded to this a few tweets ago but just pushed up a shortish blog on a subtle feature of CLC work stealing that makes cuda-graphable grouped_gemm possible with this scheduling mode: https://t.co/n665ou3PRv

4

46

9

26

6K

0

2

0

121

Reuben Stern @ReubenConducts

about 1 month ago

@Simon_Vt @GPU_MODE @marksaroufim Jack has a special gift for exposition!

0

1

0

123

Reuben Stern @ReubenConducts

about 2 months ago

My colleagues Jack Carlisle and Jay Shah gave a fantastic lecture for @GPU_MODE yesterday on our categorical foundations for CuTe layout algebra! They were joined by Cris Cecka, the inventor of CuTe, and @marksaroufim as moderators. Bravi tutti! https://t.co/FRsiOAPGfH

2

59

9

33

5K

Reuben Stern @ReubenConducts

about 1 month ago

@charles_irl pmpp and the rising sea are the only two books anyone could ever need!

0

1

0

106

Reuben Stern @ReubenConducts

about 1 month ago

@drisspg gemm+epi op+fused quant is big

1

3

0

1

122

Reuben Stern @ReubenConducts

about 2 months ago

@drisspg the answer of course is "more than you think; not as much as you want"

1

0

63

ReubenConducts retweeted

Ant Ling

@AntLingAGI

2 months ago

🚀 Linear Attention is unlocking million-token context windows by dropping computational complexity from O(N^2) to O(N), but software is increasingly bottlenecking the hardware. Meet cuLA (CUDA Linear Attention): hand-written kernels using CuTe DSL & CUTLASS C++ to extract maximum performance on NVIDIA GPUs. A drop-in replacement for FLA designed to push hardware to its absolute limits.

6

381

49

247

91K

ReubenConducts retweeted

Cursor @cursor_ai

2 months ago

Thank you to the companies and open-source communities behind Kimi K2.5, Ray, ThunderKittens, PyTorch, and more. We'd also like to thank Fireworks and Colfax for their collaboration and partnership.

9

296

8

27

74K

ReubenConducts retweeted

PyTorch

@PyTorch

2 months ago

PyTorch 2.11 is now available, featuring 2,723 commits from 432 contributors since PyTorch 2.10. This release prioritizes performance scaling for distributed training and next-generation hardware architectures. Highlights include a FlashAttention-4 backend for FlexAttention on Hopper and Blackwell GPUs, Differentiable Collectives for distributed training, and performance optimizations for Intel GPUs via XPU Graph. This release also delivers comprehensive operator expansion for Apple Silicon (MPS) and RNN/LSTM GPU export support. 🖇️ Read the PyTorch 2.11 release blog and release notes: https://t.co/JZ4xkjEiNQ #PyTorch #OpenSource #AIInfrastructure

PyTorch's tweet photo. PyTorch 2.11 is now available, featuring 2,723 commits from 432 contributors since PyTorch 2.10. This release prioritizes performance scaling for distributed training and next-generation hardware architectures.

Highlights include a FlashAttention-4 backend for FlexAttention on Hopper and Blackwell GPUs, Differentiable Collectives for distributed training, and performance optimizations for Intel GPUs via XPU Graph. This release also delivers comprehensive operator expansion for Apple Silicon (MPS) and RNN/LSTM GPU export support.

🖇️ Read the PyTorch 2.11 release blog and release notes: https://t.co/JZ4xkjEiNQ

#PyTorch #OpenSource #AIInfrastructure

13

613

85

96

59K

ReubenConducts retweeted

Tri Dao

@tri_dao

3 months ago

The frontier has increasingly shifted to hybrid models - from Qwen to Kimi-Linear and now with NVIDIA's Nemotron-3 Super - that rely on a strong linear sequence model. Today we release Mamba-3, the most powerful linear model to date. https://t.co/OpMmqEWMkP

11

839

111

330

78K

Reuben Stern @ReubenConducts

3 months ago

@drisspg @gaunernst it's pretty close, but FA-4 is missing in particular some of the inference optimizations that FA-3 has, such as cuda graphability via dynamic scheduling metadata. coming soon, though!

0

61

Reuben Stern @ReubenConducts

3 months ago

It's been great working on the FA-4 backend to FlexAttention -- check out this blog post to learn more!

PyTorch

@PyTorch

3 months ago

FlexAttention now has a FlashAttention-4 backend. FlexAttention has enabled researchers to rapidly prototype custom attention variants—with 1000+ repos adopting it and dozens of papers citing it. But users consistently hit a performance ceiling. Until now. We've added a FlashAttention-4 backend to FlexAttention on Hopper and Blackwell GPUs. PyTorch now auto-generates CuTeDSL score/mask modifications and JIT-instantiates FlashAttention-4 for your custom attention variant. The result: 1.2× to 3.2× speedups over Triton on compute-bound workloads. 🖇️ Read our latest blog here: https://t.co/KVElBn4TEE No more choosing between flexibility and performance. hashtag#PyTorch hashtag#FlexAttention hashtag#FlashAttention hashtag#OpenSourceAI

PyTorch's tweet photo. FlexAttention now has a FlashAttention-4 backend.

FlexAttention has enabled researchers to rapidly prototype custom attention variants—with 1000+ repos adopting it and dozens of papers citing it.

But users consistently hit a performance ceiling. Until now.

We've added a FlashAttention-4 backend to FlexAttention on Hopper and Blackwell GPUs. PyTorch now auto-generates CuTeDSL score/mask modifications and JIT-instantiates FlashAttention-4 for your custom attention variant.

The result: 1.2× to 3.2× speedups over Triton on compute-bound workloads.

🖇️ Read our latest blog here: https://t.co/KVElBn4TEE

No more choosing between flexibility and performance.

hashtag#PyTorch hashtag#FlexAttention hashtag#FlashAttention hashtag#OpenSourceAI

12

726

98

337

101K

0

1

0

61

ReubenConducts retweeted

Belinda Li

@belindazli

4 months ago

New blog post on introspection for interpretability, and why I think training models to self-explain is a promising frontier for interpretability research:

belindazli's tweet photo. New blog post on introspection for interpretability, and why I think training models to self-explain is a promising frontier for interpretability research: https://t.co/AfwTc4u59d

8

246

37

184

23K

Reuben Stern

@ReubenConducts

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users