Nihal Pasham

@npashi

🦀 Rust Tech | @Nvidia | Make general purpose GPU programming accessible 🖖 Disclaimer: The views, opinions expressed are my own (not my employer's)

Bangalore, IN

Joined October 2011

119 Following

1.1K Followers

1.9K Posts

Pinned Tweet

Nihal Pasham @npashi

28 days ago

Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡ We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust. No DSLs. No FFI. No source-to-source step. Single source. Short🧵👇

npashi's tweet photo. Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡

We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust.

No DSLs. No FFI. No source-to-source step. Single source.

Short🧵👇 https://t.co/YRERctlysd

49

2K

293

1K

185K

npashi retweeted

Leo Alt @leonardoalt

20 days ago

We can now fully rewrite most software in @leanprover and prove it correct: - Compiler module rewrite (AI) from Rust to Lean - Full FFI integration - All unit and integration tests pass - Formal spec and proofs!! - Under 20h wall time (unnoticed pauses) https://t.co/u601dZ8wph

9

186

24

77

15K

Nihal Pasham @npashi

25 days ago

@Nekrolm @roeschinc @nvidia @Nekrolm - the index_2d bug-fix landed. The latest vecadd example uses the more ergonomic api - get_mut_indexed and if you'd like to take a look at the entire fix - https://t.co/PssDX87Q8I, feel free to reopen if you have thoughts. Thanks for flaggin this again.

0

2

0

0

78

Nihal Pasham @npashi

27 days ago

@Nekrolm @roeschinc @nvidia I'll push out a branch to help reason about this more concretely

1

1

0

0

37

Who to follow

IoT Security Research Group

We are IoTSRG, an open-source initiative in IoT security Community, known for our flagship IoT-PTv1 OS, curated content, and influential online groups;

EVP and GM, Security Technologies, Akamai.

Nihal Pasham @npashi

26 days ago

@hazle111753854 @nvidia @VaivaswathaN We haven't done an actual comparison yet but so far the experience is - its on par with Cuda C++. Ps: needs thorough validation though.

0

3

0

0

233

Nihal Pasham @npashi

28 days ago

Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡ We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust. No DSLs. No FFI. No source-to-source step. Single source. Short🧵👇

npashi's tweet photo. Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡

We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust.

No DSLs. No FFI. No source-to-source step. Single source.

Short🧵👇 https://t.co/YRERctlysd

49

2K

293

1K

185K

Nihal Pasham @npashi

26 days ago

@ibuildthecloud 😅

0

2

0

0

77

Nihal Pasham @npashi

26 days ago

@shareastronomy Apparently communities are being deprecated.

1

2

0

0

106

Nihal Pasham @npashi

26 days ago

I know our community won’t be around come June (as ‘X’ has other plans). But before we go — I thought I’d drop something I’ve been working on for a while here.

Nihal Pasham @npashi

28 days ago

Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡ We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust. No DSLs. No FFI. No source-to-source step. Single source. Short🧵👇

npashi's tweet photo. Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡

We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust.

No DSLs. No FFI. No source-to-source step. Single source.

Short🧵👇 https://t.co/YRERctlysd

49

2K

293

1K

185K

2

48

7

7

15K

Nihal Pasham @npashi

26 days ago

@JamesTervit @nvidia @VaivaswathaN This is great. Please feel free. Would love to hear feedback.

10

21

3

3

731

Nihal Pasham @npashi

27 days ago

0

0

0

0

30

Nihal Pasham @npashi

27 days ago

@dr_sensor Just a small correction. Kernel code doesn’t support async/await. But we can do async kernel launches. We have a few examples that use the #tokio runtime to launch GPU work asynchronously.

0

1

0

0

52

Nihal Pasham @npashi

27 days ago

@mr_r0b0t @entrepeneur4lyf @nvidia @VaivaswathaN @perplexity_ai Interesting, I didn’t know they had one. Would you happen to have a link?

1

0

0

0

31

Nihal Pasham @npashi

27 days ago

@WebDevCaptain @nvidia @VaivaswathaN @grok I’ll lend @grok a hand here - https://t.co/572j2R26hy

2

12

0

1

3K

Nihal Pasham @npashi

27 days ago

@VaivaswathaN 💯 - I can say this after having worked with it over the past few months. Pliron addresses all of cuda-oxide's needs - extensibility, rust-native and a breeze to debug. Love it!

2

2

1

1

162

Nihal Pasham @npashi

28 days ago

@AstraKernel @rustoftheday @rustaceans_rs @ThisWeekInRust

1

20

0

0

5K

Nihal Pasham @npashi

28 days ago

46 worked examples shipped — async MLP, cross-crate kernels, Rust ↔ C++/CCCL device FFI. Still alpha + under active development. So, expect bugs, missing features, API churn — we think its a good start. ⭐ https://t.co/ongDJBi7Ew 📖 https://t.co/OpYHBUd8YP

3

124

11

54

7K

Nihal Pasham @npashi

28 days ago

🌟 Highlights: 🧱 Custom rustc → PTX backend ⚡ Generics, closures, structs, enums on the GPU 🔧 Full intrinsics: warp, shared mem, atomics, clusters 📈 GEMM (naive): 868 TFLOPS 🌐 Composable async, .await on GPU work 🦀 Pliron pipeline: Rust → MIR → Pliron → LLVM → PTX

0

104

5

14

6K

Nihal Pasham @npashi

28 days ago

@Nekrolm @roeschinc @nvidia But I do think, it is actually more ergonomic (i.e. rusty) option if perf is not an issue. However, on another note - I'm working on a oxide-native MIR Inliner pass, which should solve the perf problem.

1

1

0

0

60

Nihal Pasham @npashi

28 days ago

@Nekrolm @roeschinc @nvidia Tried that first. But perf cost adds up quickly for trivial index arithmetic -- at the method-call boundary the no-param `get_mut()` doesn't inline, so the index is computed twice plus call overhead. So, went with ThreadIndex version.

1

0

0

0

43

Last Seen Users on Sotwe

Trends for you

Most Popular Users