Senior Staff Engineer at Samsung SSIR. Primary interests: machine learning & NLP. Author of books on fasttext and ML with Rust. AI to make peoples lives easier.
One of the first steps in any NLP pipeline is tokenization. It is the process of splitting a given text into a sequence of units that are related to the vocabulary of the corpus. Thus we get a representation of a given text. ==thread==
TL;DR of a Blackwell GPU:
160 SMs → each with 128 CUDA + 4 Tensor cores
→ Warps of 32 threads (SIMT)
→ Latency hiding via warp switching
→ Fast local memory → L2 → HBM
Optimize for keeping data close to compute. Everything else follows. 🔥
The golden rule of GPU performance:
Keep data as HIGH in the memory hierarchy as possible.
Constantly reading/writing from HBM = slow cores, wasted compute.
Data in registers or shared memory = blazing throughput. This is the whole game.
1/ NVIDIA just dropped the Vera Rubin technical report.
I spent last couple of days going through it.
Here's what it tells every AI/ML developer about where to invest their learning time 🧵
11/The takeaway: NVIDIA's hardware bets are a signal about what software will matter in the next 2-3 years.
They don't build hardware hoping developers catch up.
They build hardware for where they know developers are going.
Read the report. It's a roadmap.