GPU MODE @gpu_mode - Twitter Profile

GPU MODE

@GPU_MODE

2 days ago

@Aru__09 this is so cool! When you're done and if you wanna give a talk about this work please lmk!

1

7

0

2

745

GPU_MODE retweeted

snow

@snowclipsed

5 days ago

YEEEEEES

2

43

2

6

3K

GPU_MODE retweeted

Marcio K

@MarcioK

20 days ago

Since the Humanity's Last Hackathon from @huggingface didn’t happen, I set up my own mini version using Kernelbot and Popcorn from @gpu_mode. > The goal was to test how well LLMs can generate code for difficult tasks, like writing faster kernels for Apple’s MPS with @PyTorch. > My strategy was to let the LLM submit a kernel, get feedback from the benchmark, and then iterate based on the learnings. > The hardest part was not the code generation itself, but coordinating all the systems. Kernelbot, Popcorn, submissions, feedback, orchestration... > The benchmark eats almost all my RAM, so parallelizing too many submissions is hard. My machine starts crashing if I push it too much. Overall, I need more time to tune the prompts, experiment with better feedback loops, and maybe try some RL-style iteration. There are still lots of techniques worth exploring here. In the video: Left: task orchestrator Right: live dashboard tracking submissions, code, and lessons learned

0

9

2

1

2K

GPU_MODE retweeted

Simon V @Simon_Vt

about 1 month ago

@ReubenConducts @GPU_MODE @marksaroufim Best talk on CuTe Layout algebra imo :)

1

3

1

2K

GPU MODE

@GPU_MODE

about 1 month ago

NVIDIA cuDNN team tomorrow at noon

2

114

6

14

5K

GPU MODE

@GPU_MODE

about 1 month ago

Codex is so good at writing kernels that it felt appropriate to do a Codex only kernel competition. Metal is great because you'll be able to tangibly feel the perf improvements in your local models

Ben Burtenshaw

@ben_burtenshaw

about 1 month ago

Humanity's Last Hackathon is NOW OPEN for registration. This is not a normal hackathon. You will be judged on the context, not the code! Use Codex @OpenAIDevs to build and optimize models for local inference (kernels on Max metal). Submit through @GPU_MODE. Climb the leaderboard. Top performers qualify for the final battle. Launches May 4th. Registration is live now.

21

367

36

307

91K

0

62

3

16

6K

GPU_MODE retweeted

Ben Burtenshaw

@ben_burtenshaw

about 1 month ago

Humanity's Last Hackathon is NOW OPEN for registration. This is not a normal hackathon. You will be judged on the context, not the code! Use Codex @OpenAIDevs to build and optimize models for local inference (kernels on Max metal). Submit through @GPU_MODE. Climb the leaderboard. Top performers qualify for the final battle. Launches May 4th. Registration is live now.

21

367

36

307

91K

GPU MODE

@GPU_MODE

about 1 month ago

GPU MODE cited on @tbpn - thank you for the plug @AnushElangovan !! We hope to continue making GPU programming more accessible to everyone!

TBPN

@tbpn

about 1 month ago

AMD's @AnushElangovan explains why he thinks his company's open source ethos combined with agentic AI superpowers their leverage as a company: Because AMD publishes a lot of technical details about its hardware, when engineers use AI tools, the models already “understand” AMD’s systems and can help write code for them, debug them, or even generate new tools. And that makes developers more productive on AMD hardware without AMD having to do all the work internally. "AMD has had this ethos of open source, which really plays to our advantage. Every frontier model that I use has already seen every bit of AMD source code." "It'll rewrite my spec for me because it's already in the training data. Which you can't get from closed ecosystems." "In fact I built a virtual GPU simulator just based off our public specs, and now I'm running it on the GPU. So now I can run cross-generational GPU simulations on existing hardware." "We have that advantage. And we've run a Dev Day contest where we generated more tokens on AMD — Triton kernels and HIP kernels — than existed on the internet at the time." "So now that's all part of the pre-training data. It's a superpower because now you're open source, and you're agentically accelerating this process."

10

131

14

38

34K

2

39

2

7

8K

GPU MODE

@GPU_MODE

about 1 month ago

https://t.co/V8tXT5MINR

1

24

1

15

2K

GPU MODE

@GPU_MODE

about 1 month ago

We helped host a kernel competition for @tri_dao's course at Princeton's COS 484: Natural Language Processing If you're a university or educator that's interested in live programming problems for your students please reach out!

2

78

2

42

23K

GPU MODE

@GPU_MODE

about 1 month ago

Gluon and Linear Layouts. Tomorrow noon PST https://t.co/OWXJw7ThTk

0

19

2

7

2K

GPU_MODE retweeted

Reuben Stern @ReubenConducts

about 2 months ago

My colleagues Jack Carlisle and Jay Shah gave a fantastic lecture for @GPU_MODE yesterday on our categorical foundations for CuTe layout algebra! They were joined by Cris Cecka, the inventor of CuTe, and @marksaroufim as moderators. Bravi tutti! https://t.co/FRsiOAPGfH

2

59

9

33

5K

GPU MODE

@GPU_MODE

about 2 months ago

Jack Carlisle & Jay Shah talk on the Fundamentals of CuTe Layout Algebra. Special guest Cris Cecka See you soon! https://t.co/fg4OXeQYB2

1

61

12

26

5K

GPU_MODE retweeted

Pieter Delobelle

@pieterdelobelle

about 2 months ago

our "@pleiasfr and friends" team got second place at the @GPU_MODE hackathon in Paris last week! 🥈 we had a lot of fun optimizing our training throughput, so trying out 8 bit training, muon, RoPE/NoPE, conv architectures, ... Basically nanogpt speedrunning on B300s.

pieterdelobelle's tweet photo. our "@pleiasfr and friends" team got second place at the @GPU_MODE hackathon in Paris last week! 🥈

we had a lot of fun optimizing our training throughput, so trying out 8 bit training, muon, RoPE/NoPE, conv architectures, ... Basically nanogpt speedrunning on B300s. https://t.co/AqinWDfRaN

5

31

3

5

7K

GPU MODE

@GPU_MODE

about 2 months ago

We're back on schedule Tuesday April 14 at 9am PST we'll have a talk from Andrei Panferov and Erik Schultheis on their improved recipe for nvfp4 pretraining. They'll cover both math and kernels https://t.co/lulPVNf4CN

0

62

13

17

6K

GPU_MODE retweeted

Zhipeng Huang @nopainkiller

about 2 months ago

Matej Sirovatka from @PrimeIntellect sharing how @arcee_ai trinity was trained

4

123

9

20

12K

GPU MODE

@GPU_MODE

2 months ago

Incredible release

Alex Zhurkevich @cudagdb

2 months ago

Trtllmgen kernels are now open. Fastest prefill and decode kernels for our target workloads. We wrote these to win InferenceX, MLPerf, other benchmarks. Powering some of today’s top served models. Dive in, learn, use them, or level up your own. Enjoy. https://t.co/2aQBwcdnZL

13

334

51

260

148K

2

33

3

12

7K

GPU_MODE retweeted

Sinatras

@myainotez

2 months ago

https://t.co/3kq6s5Syl3

6

87

20

46

18K

GPU_MODE retweeted

PyTorch

@PyTorch

2 months ago

GPU MODE (@GPU_MODE) & PyTorch Foundation are organizing an ML systems hackathon in Paris on April 9, immediately following PyTorch Conference Europe 2026. Researchers and engineers will compete across two tracks: -Distributed training (LLM speedrun) and inference optimization (leaderboard) - Access to a B300 cluster from Verda and H200 instances from Sesterce - Cloud credits as prizes, including 48-hour access to a GB300 NVL72 rack - Talks from PyTorch (Helion), vLLM, Prime Intellect, and more - Food and refreshments Doors open at 9:30, with the closing ceremony at 20:00. Attendees can join with a pre-formed team or match on-site. Location details are shared upon registration. Spots are limited. Register: https://t.co/1BYgfGWJIr #OpenSourceAI #PyTorchCon #PyTorch #vLLM #Helion

PyTorch's tweet photo. GPU MODE (@GPU_MODE) & PyTorch Foundation are organizing an ML systems hackathon in Paris on April 9, immediately following PyTorch Conference Europe 2026.

Researchers and engineers will compete across two tracks:

-Distributed training (LLM speedrun) and inference optimization (leaderboard)
- Access to a B300 cluster from Verda and H200 instances from Sesterce
- Cloud credits as prizes, including 48-hour access to a GB300 NVL72 rack
- Talks from PyTorch (Helion), vLLM, Prime Intellect, and more
- Food and refreshments

Doors open at 9:30, with the closing ceremony at 20:00. Attendees can join with a pre-formed team or match on-site. Location details are shared upon registration.

Spots are limited. Register: https://t.co/1BYgfGWJIr

#OpenSourceAI #PyTorchCon #PyTorch #vLLM #Helion

5

189

22

53

26K

GPU MODE

@GPU_MODE

Last Seen Users on Sotwe

Trends for you

Most Popular Users