Melih Elibol @melibol - Twitter Profile

melibol retweeted

Greg Brockman

@gdb

3 days ago

Rust is great. We’re making a $600,000 commitment to the Rust Foundation:

110

3K

155

257

342K

melibol retweeted

Charlie Marsh

@charliermarsh

3 days ago

At OpenAI, we're continuing to bet on Rust as the future of systems programming. I'm proud to announce that we're making a $600,000 commitment to the Rust Foundation, which combines our Platinum membership with additional support for maintainer efforts across the Rust ecosystem.

134

5K

236

487

548K

melibol retweeted

Charlie Marsh

@charliermarsh

3 days ago

I love Rust

50

563

20

13

24K

melibol retweeted

ayush🔮👨‍💻🔮

@ayushagarwal027

2 days ago

NVIDIA researchers just brought Rust's ownership model to GPU kernels. 🦀 The paper: "Fearless Concurrency on the GPU" introduces cuTile Rust. The problem: writing custom GPU kernels in Rust meant stepping outside Rust's safety guarantees entirely. cuTile Rust fixes that : mutable outputs get split into disjoint pieces, kernel launches preserve ownership rules from host to device, with local opt-outs when you need raw control. The performance holds up: → 7 TB/s for element-wise ops on NVIDIA B200 → 2 PFlop/s for GEMM : 96% of cuBLAS → Matches cuTile Python within measurement noise They also built Grout, an inference engine on top, running real models: → 171 tokens/s for Qwen3-4B on RTX 5090 → 82 tokens/s for Qwen3-32B on B200 → Competitive with vLLM and SGLang Safe, idiomatic Rust at full CUDA performance. This is a big step for Rust in ML infra. 🔗 https://t.co/A5rN8ULiLl #Rust #RustLang #GPU #CUDA #MachineLearning #SystemsProgramming #NVIDIA

ayushagarwal027's tweet photo. NVIDIA researchers just brought Rust's ownership model to GPU kernels. 🦀

The paper: "Fearless Concurrency on the GPU" introduces cuTile Rust.

The problem: writing custom GPU kernels in Rust meant stepping outside Rust's safety guarantees entirely.

cuTile Rust fixes that : mutable outputs get split into disjoint pieces, kernel launches preserve ownership rules from host to device, with local opt-outs when you need raw control.

The performance holds up:
→ 7 TB/s for element-wise ops on NVIDIA B200
→ 2 PFlop/s for GEMM : 96% of cuBLAS
→ Matches cuTile Python within measurement noise

They also built Grout, an inference engine on top, running real models:
→ 171 tokens/s for Qwen3-4B on RTX 5090
→ 82 tokens/s for Qwen3-32B on B200
→ Competitive with vLLM and SGLang

Safe, idiomatic Rust at full CUDA performance. This is a big step for Rust in ML infra.

🔗 https://t.co/A5rN8ULiLl

#Rust #RustLang #GPU #CUDA #MachineLearning #SystemsProgramming #NVIDIA

2

111

13

86

6K

Who to follow

Erlang Workshop

@ErlangWorkshop

The ACM SIGPLAN Erlang Workshop is an annual event for the development and research community of Erlang/OTP and related technologies.

Creator of Layups, Sticker Drop. Building Braille Scanner app. CPACC. Co-organizer Cocoaheads Sydney. Views are my own. He/him. Prev @iosaaron #A11Y

melibol retweeted

Charles 🎉 Frye

@charles_irl

3 days ago

CuTile-rs paper! https://t.co/5zOOYfS8Pz

3

397

45

260

15K

melibol retweeted

Eric Buehler

@ericlbuehler

3 days ago

Excited to share cuTile Rust: bringing Rust's fearless concurrency to GPU kernel programming. Our paper "Fearless Concurrency on the GPU" is now on arXiv. For me the best part is how fast you can iterate: high-performance CUDA kernels developed directly in Rust. Huge thanks to my co-authors Melih Elibol, Jared Roesch, Isaac Gelado, and Michael Garland on this project.

ericlbuehler's tweet photo. Excited to share cuTile Rust: bringing Rust's fearless concurrency to GPU kernel programming. Our paper "Fearless Concurrency on the GPU" is now on arXiv.

For me the best part is how fast you can iterate: high-performance CUDA kernels developed directly in Rust.

Huge thanks to my co-authors Melih Elibol, Jared Roesch, Isaac Gelado, and Michael Garland on this project.

4

157

17

80

8K

melibol retweeted

Nihal Pasham @npashi

about 1 month ago

Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡ We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust. No DSLs. No FFI. No source-to-source step. Single source. Short🧵👇

npashi's tweet photo. Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡

We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust.

No DSLs. No FFI. No source-to-source step. Single source.

Short🧵👇 https://t.co/YRERctlysd

49

2K

293

1K

186K

melibol retweeted

Jared Roesch

@roeschinc

3 days ago

Fearless Concurrency on the GPU For those interested @melibol just posted a paper on building a safe Rust kernel programming abstraction on top of Tile IR. https://t.co/MMPxi4oOEg A short teaser: but the safety is effectively free. On a B200, the safe GEMM is competitive with cuBLAS: about 2 PFlop/s 92% of the GPU's dense f16 roofline. Read more in the paper or Melih's LinkedIn post (https://t.co/jyyfdC2Vc8) He will also be giving a talk at RustConf in September, hopefully he will see you there!

1

189

27

138

8K

melibol retweeted

Jared Roesch

@roeschinc

5 days ago

I am giving a talk at ARRAY @ PLDI this week. If anyone is around Boulder for PLDI would love to catch up! see you all there!

0

6

1

473

melibol retweeted

Matt @matt_dz

3 months ago

cuTile Rust: a safe, tile-based kernel programming DSL for the Rust programming language https://t.co/BfdGEJ969w features a safe host-side API for passing tensors to asynchronously executed kernel functions