Xander Chin @XanderChin - Twitter Profile

Pinned Tweet

Xander Chin

@XanderChin

7 months ago

hand-controlled boids

312

21K

1K

5K

739K

XanderChin retweeted

Tim @llhtimlam

17 days ago

Want to see distributed computing explained via Pong? Inspired by TinyTPU and TinyTapeout workshop at FOSSi, I wrote a paper under a week pairs this demo with a proposed next-gen optical I/O chip architecture & a roadmap to prototype it. Read it on GitHub: https://t.co/td2oIRfIDW

7

107

10

102

22K

Xander Chin

@XanderChin

16 days ago

@llhtimlam fire

0

1

0

115

Xander Chin

@XanderChin

about 1 month ago

@satvikgari make it speak like a toronto mans

1

2

0

103

XanderChin retweeted

Satvik Garimella

@satvikgari

about 1 month ago

A few months ago, I saw Karpathy build NanoChat in PyTorch, and it made me want to understand how these models work underneath the abstractions. So I decided to try building one myself, but in a different framework: JAX. Here’s how I did it: 🧵

satvikgari's tweet photo. A few months ago, I saw Karpathy build NanoChat in PyTorch, and it made me want to understand how these models work underneath the abstractions.

So I decided to try building one myself, but in a different framework: JAX.

Here’s how I did it: 🧵 https://t.co/EzpZHoE3fR

5

25

7

9

2K

XanderChin retweeted

saksham

@sakshambatraa

about 1 month ago

reinventing Groq's LPU with @michael_trbo we got instruction driven data movement working between SRAM memory blocks and MXM compute!!

11

77

18

63

6K

XanderChin retweeted

luthira

@luthiraabeykoon

about 1 month ago

We implemented @karpathy 's MicroGPT fully on FPGA fabric. No GPU. No PyTorch. No CPU inference loop. Just a transformer burned into hardware, generating 50,000+ tokens/sec. The model is small, but the idea is not: inference does not have to live only in software 👇

266

8K

697

5K

851K

Xander Chin

@XanderChin

about 2 months ago

@LukeBlomm @HockeyAnalytics yooo congrats Luke

0

1

0

88

Xander Chin

@XanderChin

about 2 months ago

@MankyDankyBanky fire

1

4

0

3K

Xander Chin

@XanderChin

about 2 months ago

@evanliin bro me too

0

1

0

58

XanderChin retweeted

surya

@suryasure05

about 2 months ago

anyone subletting a 1 bedroom apartment in Toronto this summer?

4

29

5

3

5K

Xander Chin

@XanderChin

2 months ago

@arjunharinath1 @satvikgari lesgooooo

0

4

0

157

XanderChin retweeted

arjun

@arjunharinath1

2 months ago

@satvikgari and I have been building our own version of Nvidia’s Blackwell GPU. We just designed a 4x4 systolic array in Verilog! Here’s a breakdown of how it works and what we learned building it.

11

70

8

45

6K

XanderChin retweeted

evan

@evanliin

2 months ago

blog blog blog blah blah https://t.co/b6NP2yoYLv

12

101

9

65

7K

XanderChin retweeted

krupa

@krupaad

2 months ago

bit late to the recruiting cycle, but looking for a summer internship in ML/hardware/inference!! i've been working on CUDA kernel writing, FPGA acceleration and RTL. would love to find a team doing similar work this summer dual US/Canada citizen, can relocate anywhere DMs open :)

36

263

14

76

31K

Xander Chin

@XanderChin

2 months ago

@krupaad cfbr!

0

1

0

113

XanderChin retweeted

surya

@suryasure05

2 months ago

wrote an article breaking down the math behind TurboQuant by @GoogleResearch. I walk through a toy example using concrete numbers to show every single operation that goes on under the hood. link below:

30

928

115

1K

76K

Xander Chin

@XanderChin

2 months ago

seriously impressive stuff. give this man a follow

ani

@anirudhbv_ce

2 months ago

I implemented @GoogleResearch's TurboQuant as a CUDA-native compression engine on Blackwell B200. 5x KV cache compression on Qwen 2.5-1.5B, near-loseless attention scores, generating live from compressed memory. 5 custom cuTile CUDA kernels ft: - fused attention (with QJL corrections) - online softmax -on-chip cache decompression - pipelined TMA loads Try it out: https://t.co/m5vkJxWIY6 s/o @blelbach and the cuTile team at @nvidia for lending me Blackwell GPU access :) cc @sundeep @GavinSherry

146

3K

310

3K

806K

9

1K

36

842

116K

Xander Chin

@XanderChin

2 months ago

@satvikgari @arjunharinath1 and GOAT

0

1

0

64

Xander Chin

@XanderChin

2 months ago

@satvikgari @arjunharinath1 the G in Garimella is for GPU

2

9

0

461

XanderChin retweeted

Satvik Garimella

@satvikgari

2 months ago

Recently @arjunharinath1 and I started building our own version of Nvidia's Blackwell GPU. We built the ALUs and a 4-lane SIMD core in Verilog. Here is a breakdown of how we did it.

satvikgari's tweet photo. Recently @arjunharinath1 and I started building our own version of Nvidia's Blackwell GPU.

We built the ALUs and a 4-lane SIMD core in Verilog.

Here is a breakdown of how we did it. https://t.co/7F1FqY01cv

15

181

25

123

12K

Xander Chin

@XanderChin

Last Seen Users on Sotwe

Trends for you

Most Popular Users