Doom ran on a 486 with 4MB of RAM in 1993
the entire world was rendered using a binary space partition tree
John Carmack built the BSP when the level loaded not every frame
the tree split the map into regions and stored the draw order ahead of time
so during rendering the engine just walked the tree instead of figuring visibility from scratch
a BSP node divides space into front and back
if the player is in front render the front subtree first
if behind render the back subtree first
the tree already knows what should appear first
this is why Doom did not need a z buffer
correct visibility came from the BSP traversal itself
SAVE 45%, only on April 28th!
Today, you can SAVE on Rearchitecting LLMs by @PereMartra and other related titles: https://t.co/cVTSmP0By8
As you work through this practical book, you’ll perform hands-on surgery on popular open-source models like Llama-3, Gemma, and Qwen to create cost-effective local small language models (SLMs). Along the way, you’ll learn how to combine behavioral analysis with structural modifications, identifying and removing parts that don’t contribute to your model’s goals, and even use “fair pruning” to reduce model bias at the neuron level.
Save 45%, only on April 19th!
Today, you can save on CUDA for Deep Learning by @elliotarledge and other related titles: https://t.co/GWvojERUZo
Written for the latest NVIDIA hardware, the book builds a deep understanding of CUDA (Compute Unified Device Architecture) fundamentals that will stay relevant as chips upgrade and evolve. CUDA delivers direct control, debugging power, and acceleration at the GPU level that can’t be matched by other types of optimizations.
Notes from last week on @nvidia's Groq 3 LPX system architecture.
- Attention-FFN disaggregation
- Tensor, Pipeline and Data parallelism across the LPX and GPUs
- Kernel design patterns (2D grid-based persistent kernels applied on a SWA kernel)
- Practiced hand-written cuTile CUDA kernels
- Made a PR for TileGym collection
This is the best way to learn how LLMs work.
Interactive. 3D. Step-by-step.
Covers:
→ Embedding
→ Layer Norm
→ Self-Attention
→ MLP
→ Transformer layers
→ Softmax
→ Output
Stop reading papers. Start seeing.
Link in comments.
Save this immediately.
Python still underpins how we understand and build modern AI systems.
@rasbt's April 14th keynote at @PyConDE will cover the full journey — from model design and training loops to scaling across devices, and why Python still holds up even as backends evolve.
Details: https://t.co/Itysj2Lelt
Explore his work: https://t.co/GuyVqD9LMf
Virtual tickets available via the main conference page.
Paged Attention borrowed an old idea from operating systems and solved a new problem in LLMs.
Operating systems solved memory fragmentation decades ago with paging.
LLMs had the exact same problem with KV Cache. Same solution.