Giving a talk in the @Stanford SCIEN seminar this Wednesday (1/3) at 4:30pm:
https://t.co/EgDLBwv2YU
The topic is “normal coordinates”: a shape representation little-used outside of mathematics—but which turns out to have nice applications in geometry processing & learning.
A post on async mechanisms in modern GPU kernels: three scheduling regimes, a stack of abstractions used to express them, and some questions for what a kernel DSL should look like now. https://t.co/PzEHsB6GxW
@DemetriSpanos Not exactly what you were asking for, but I feel like this paper is very much in the sweet spot of "everyone needs to do the operation described in the paper, and it requires understanding a lot of fundamental calculus": https://t.co/HVKpxFjuIr
TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing
Yavuz Selim Tozlu, Anshul Naithani, Huiyang Zhou
https://t.co/v3vYfAVNNk
Abstract:
Ray tracing (RT) is a 3D graphics technique that offers highly realistic visuals. It is becoming prominent and accessible as GPU vendors have integrated dedicated ray tracing acceleration hardware. However, tracing millions of rays through 3D scenes consisting of high numbers of triangles in real time is challenging and requires expensive hardware. The main bottleneck in RT workloads is the expensive Bounding Volume Hierarchy (BVH) traversal task, which is a large tree structure that encodes the 3D scene. BVH traversal is a memory-bound problem, as the GPU threads spend most of their time reading tree node data from memory. In this work, we attack the memory latency bottleneck of ray tracing through prefetching. We propose a novel hardware prefetcher, named Tree Traversal Prefetcher (TTP), for ray tracing. The main idea is to leverage the existing tree traversal stack in the RT units for highly accurate prefetching. In particular, TTP prefetches nodes using the addresses already available on the hardware traversal stacks of each thread. For DFS (Depth-first search) based traversal, prefetches are generated when nodes are being popped consecutively from the traversal stack, potentially corresponding to upward traversal through the tree. We evaluate TTP on a cycle-level simulator, Vulkan-sim 2.0, and show that it achieves 1.48x speedup on average (up to 1.89x) compared to the baseline, with nearly negligible hardware overhead. TTP achieves 98.92% average L1 accuracy, which is the ratio of the prefetched blocks being actually referenced by demand loads. The coverage, computed as the ratio of L1 miss reduction over baseline L1 misses, is 31.54%, correlating well with the achieved speedup.
New blog post!
In "Quantizing tangent frames", we look at various established methods to represent tangent frames in the vertex data, squeeze a few variants into 32 bits per vertex and look at the resulting precision.
https://t.co/ZsDeImK0Ls
Retweet, like and subscribe!
Working my way through "ray tracing in a weekend". @Peter_shirley thank you for making it available to the public. Personally it is taking me more than a weekend (skill-issue) 😂
Matrix-Free Multigrid with Algebraically Consistent Coarsening on Adaptive Octrees
Mengdi Wang, Yuchen Sun, Bo Zhu
https://t.co/B3Colt7que
Abstract:
We present a matrix-free GPU multigrid preconditioner with algebraically consistent coarsening for solving Poisson equations on adaptive octree grids with irregular domains. Within uniform-resolution regions, the coarsening satisfies the Galerkin principle. At T-junctions between refinement levels, we propose a flux-consistent coarse-grid correction that restores cross-level consistency while preserving the compact matrix-free representation. The coarse operators are stored in a compact matrix-free form suitable for parallel execution on GPUs. Numerical experiments demonstrate second-order accuracy, grid-independent convergence when used with PCG, and robust performance on cut-cell problems arising in fluid simulation. On a single NVIDIA RTX 4090 GPU, the solver achieves full-solve throughputs above 200 million cells per second on analytical Poisson tests and above 70 million cells per second on pressure projection problems in fluid simulation.
Finally! All of that for just showing a dynamic illustration in app!
Bunch of algorithms was implemented for this dynamic illustration.
WIP version available (https://t.co/4ZMIat8pqU no LaTeX yet!)
Depends on ImPlatform for custom shaders https://t.co/mKI1Rn2HUF
1/4
Introducing Texel Splatting: Perspective-Stable 3D Pixel Art
open source paper+code
Most 3D pixel art techniques (e.g. t3ssel8r, ProPixelizer) snap pixels to a screen grid, which only works with an orthographic camera
Texel splatting solves this for perspective cameras: first,
Bumping this one—We are still actively seeking a talented graphics software engineer! If Vulkan, shaders, GPU drivers, and open standards light you up, this could be your next move. Competitive comp, remote flexibility, and a passionate team.
Details & apply: https://t.co/YWmBgAligK
My "How to Vulkan in 2026" @VulkanAPI#Vulkan guide is now publicly available at https://t.co/Xy9rSIRBT8
I still consider it a preview, though I'm mostly happy with it and only plan on changing minor things and incorporating some feedback.