Gregory Hodges @ghodges_dev - Twitter Profile

Pinned Tweet

over 5 years ago

My book series, Game Physics In One Weekend, is now released to Kindle. With companion source code available at https://t.co/NWPPeOH4Wj

16

311

90

77

0

ghodges_dev retweeted

Keenan Crane

@keenanisalive

5 days ago

Giving a talk in the @Stanford SCIEN seminar this Wednesday (1/3) at 4:30pm: https://t.co/EgDLBwv2YU The topic is “normal coordinates”: a shape representation little-used outside of mathematics—but which turns out to have nice applications in geometry processing & learning.

25

908

109

777

53K

ghodges_dev retweeted

Ian Barber

@ianbarber

12 days ago

A post on async mechanisms in modern GPU kernels: three scheduling regimes, a stack of abstractions used to express them, and some questions for what a kernel DSL should look like now. https://t.co/PzEHsB6GxW

2

36

4

23

3K

ghodges_dev retweeted

Casey Muratori @cmuratori

13 days ago

@DemetriSpanos Not exactly what you were asking for, but I feel like this paper is very much in the sweet spot of "everyone needs to do the operation described in the paper, and it requires understanding a lot of fundamental calculus": https://t.co/HVKpxFjuIr

2

95

2

127

8K

Who to follow

Baldur Karlsson

@baldurk

Creator of @RenderDoc - Vulkan, D3D11, D3D12, GL & GL ES graphics debugger. Previously at Unity, Crytek UK. He/Him.

Eurographics Symposium on Rendering

@EGSympRendering

The 37th Eurographics Symposium on Rendering July 1st - July 3rd 2026

Joshua Barczak

@JoshuaBarczak

GPU Raytracing Architect at AMD. Occasional blogger, Happy Husband and Proud Papa. This is my personal account, opinions are my own, etc, etc.

ghodges_dev retweeted

Vlad Erium 🇯🇵

@ssh4net

20 days ago

TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing Yavuz Selim Tozlu, Anshul Naithani, Huiyang Zhou https://t.co/v3vYfAVNNk Abstract: Ray tracing (RT) is a 3D graphics technique that offers highly realistic visuals. It is becoming prominent and accessible as GPU vendors have integrated dedicated ray tracing acceleration hardware. However, tracing millions of rays through 3D scenes consisting of high numbers of triangles in real time is challenging and requires expensive hardware. The main bottleneck in RT workloads is the expensive Bounding Volume Hierarchy (BVH) traversal task, which is a large tree structure that encodes the 3D scene. BVH traversal is a memory-bound problem, as the GPU threads spend most of their time reading tree node data from memory. In this work, we attack the memory latency bottleneck of ray tracing through prefetching. We propose a novel hardware prefetcher, named Tree Traversal Prefetcher (TTP), for ray tracing. The main idea is to leverage the existing tree traversal stack in the RT units for highly accurate prefetching. In particular, TTP prefetches nodes using the addresses already available on the hardware traversal stacks of each thread. For DFS (Depth-first search) based traversal, prefetches are generated when nodes are being popped consecutively from the traversal stack, potentially corresponding to upward traversal through the tree. We evaluate TTP on a cycle-level simulator, Vulkan-sim 2.0, and show that it achieves 1.48x speedup on average (up to 1.89x) compared to the baseline, with nearly negligible hardware overhead. TTP achieves 98.92% average L1 accuracy, which is the ratio of the prefetched blocks being actually referenced by demand loads. The coverage, computed as the ratio of L1 miss reduction over baseline L1 misses, is 31.54%, correlating well with the achieved speedup.

ssh4net's tweet photo. TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing

Yavuz Selim Tozlu, Anshul Naithani, Huiyang Zhou

https://t.co/v3vYfAVNNk

Abstract:
Ray tracing (RT) is a 3D graphics technique that offers highly realistic visuals. It is becoming prominent and accessible as GPU vendors have integrated dedicated ray tracing acceleration hardware. However, tracing millions of rays through 3D scenes consisting of high numbers of triangles in real time is challenging and requires expensive hardware. The main bottleneck in RT workloads is the expensive Bounding Volume Hierarchy (BVH) traversal task, which is a large tree structure that encodes the 3D scene. BVH traversal is a memory-bound problem, as the GPU threads spend most of their time reading tree node data from memory. In this work, we attack the memory latency bottleneck of ray tracing through prefetching. We propose a novel hardware prefetcher, named Tree Traversal Prefetcher (TTP), for ray tracing. The main idea is to leverage the existing tree traversal stack in the RT units for highly accurate prefetching. In particular, TTP prefetches nodes using the addresses already available on the hardware traversal stacks of each thread. For DFS (Depth-first search) based traversal, prefetches are generated when nodes are being popped consecutively from the traversal stack, potentially corresponding to upward traversal through the tree. We evaluate TTP on a cycle-level simulator, Vulkan-sim 2.0, and show that it achieves 1.48x speedup on average (up to 1.89x) compared to the baseline, with nearly negligible hardware overhead. TTP achieves 98.92% average L1 accuracy, which is the ratio of the prefetched blocks being actually referenced by demand loads. The coverage, computed as the ratio of L1 miss reduction over baseline L1 misses, is 31.54%, correlating well with the achieved speedup.

0

33

8

24

3K

ghodges_dev retweeted

Arseny Kapoulkine 🇺🇦 @zeuxcg

about 1 month ago

New blog post! In "Quantizing tangent frames", we look at various established methods to represent tangent frames in the vertex data, squeeze a few variants into 32 bits per vertex and look at the resulting precision. https://t.co/ZsDeImK0Ls Retweet, like and subscribe!

2

189

43

97

10K

ghodges_dev retweeted

Sebastian Aaltonen

@SebAaltonen

about 2 months ago

If you are interested in this topic, I wrote a (massive) blog post about it recently: https://t.co/uL8HsfHNYn

5

158

9

49

9K

ghodges_dev retweeted

Vikraman @Vikramantech

about 1 month ago

Working my way through "ray tracing in a weekend". @Peter_shirley thank you for making it available to the public. Personally it is taking me more than a weekend (skill-issue) 😂

Vikramantech's tweet photo. Working my way through "ray tracing in a weekend". @Peter_shirley thank you for making it available to the public. Personally it is taking me more than a weekend (skill-issue) 😂 https://t.co/VU4ejZ73IO

2

32

2

8

2K

ghodges_dev retweeted

Vlad Erium 🇯🇵

@ssh4net

about 2 months ago

Matrix-Free Multigrid with Algebraically Consistent Coarsening on Adaptive Octrees Mengdi Wang, Yuchen Sun, Bo Zhu https://t.co/B3Colt7que Abstract: We present a matrix-free GPU multigrid preconditioner with algebraically consistent coarsening for solving Poisson equations on adaptive octree grids with irregular domains. Within uniform-resolution regions, the coarsening satisfies the Galerkin principle. At T-junctions between refinement levels, we propose a flux-consistent coarse-grid correction that restores cross-level consistency while preserving the compact matrix-free representation. The coarse operators are stored in a compact matrix-free form suitable for parallel execution on GPUs. Numerical experiments demonstrate second-order accuracy, grid-independent convergence when used with PCG, and robust performance on cut-cell problems arising in fluid simulation. On a single NVIDIA RTX 4090 GPU, the solver achieves full-solve throughputs above 200 million cells per second on analytical Poisson tests and above 70 million cells per second on pressure projection problems in fluid simulation.

ssh4net's tweet photo. Matrix-Free Multigrid with Algebraically Consistent Coarsening on Adaptive Octrees

Mengdi Wang, Yuchen Sun, Bo Zhu

https://t.co/B3Colt7que

Abstract:
We present a matrix-free GPU multigrid preconditioner with algebraically consistent coarsening for solving Poisson equations on adaptive octree grids with irregular domains. Within uniform-resolution regions, the coarsening satisfies the Galerkin principle. At T-junctions between refinement levels, we propose a flux-consistent coarse-grid correction that restores cross-level consistency while preserving the compact matrix-free representation. The coarse operators are stored in a compact matrix-free form suitable for parallel execution on GPUs. Numerical experiments demonstrate second-order accuracy, grid-independent convergence when used with PCG, and robust performance on cut-cell problems arising in fluid simulation. On a single NVIDIA RTX 4090 GPU, the solver achieves full-solve throughputs above 200 million cells per second on analytical Poisson tests and above 70 million cells per second on pressure projection problems in fluid simulation.

0

57

10

50

3K

ghodges_dev retweeted

Kostas Anagnostou @KostasAAA

about 2 months ago

Ultra-fast Screen-Space Refractions and Caustics via Newton’s Method https://t.co/VrvmrGbhUM

1

317

48

235

19K

ghodges_dev retweeted

Bert Temme @berttemme

about 2 months ago

Modern rendering culling techniques https://t.co/2I956CpCKr

0

109

21

90

5K

ghodges_dev retweeted

Kostas Anagnostou @KostasAAA

about 2 months ago

Introduction to Spherical Harmonics for Graphics Programmers https://t.co/7SKiiJ7nwG

1

466

69

418

24K

ghodges_dev retweeted

Soufiane KHIAT @SoufianeKHIAT

2 months ago

Finally! All of that for just showing a dynamic illustration in app! Bunch of algorithms was implemented for this dynamic illustration. WIP version available (https://t.co/4ZMIat8pqU no LaTeX yet!) Depends on ImPlatform for custom shaders https://t.co/mKI1Rn2HUF 1/4

5

621

102

348

22K

ghodges_dev retweeted

Vivek Galatage

@vivekgalatage

2 months ago

Roadmap: Understanding GPU Architecture from Cornell https://t.co/54Lxi3H3Sg

4

1K

203

1K

139K

ghodges_dev retweeted

dylan @dylan_ebert_

3 months ago

Introducing Texel Splatting: Perspective-Stable 3D Pixel Art open source paper+code Most 3D pixel art techniques (e.g. t3ssel8r, ProPixelizer) snap pixels to a screen grid, which only works with an orthographic camera Texel splatting solves this for perspective cameras: first,

21

2K

128

1K

111K

ghodges_dev retweeted

LunarG

@LunarGInc

3 months ago

Bumping this one—We are still actively seeking a talented graphics software engineer! If Vulkan, shaders, GPU drivers, and open standards light you up, this could be your next move. Competitive comp, remote flexibility, and a passionate team. Details & apply: https://t.co/YWmBgAligK

1

24

7

13

3K

ghodges_dev retweeted

RasterGrid @RasterGrid

3 months ago

New Blog Post: Vulkan memory barriers and image layouts explained https://t.co/1AGjGQmhQP

0

55

13

46

4K

ghodges_dev retweeted

Kiriakos Gavras @Kiiiri7

4 months ago

Wrote a blog post about the Single Pass Downsampler I implemented for my depth pyramid. Code included. https://t.co/m39brs0NfR

1

120

23

78

6K

ghodges_dev retweeted

Xor

@XorDev

4 months ago

Here are some techniques I discovered through 14 years of shader programming:

36

2K

233

2K

91K

ghodges_dev retweeted

Sascha Willems @SaschaWillems2

5 months ago

My "How to Vulkan in 2026" @VulkanAPI #Vulkan guide is now publicly available at https://t.co/Xy9rSIRBT8 I still consider it a preview, though I'm mostly happy with it and only plan on changing minor things and incorporating some feedback.

0

366

67

223

22K

ghodges_dev retweeted

Kostas Anagnostou @KostasAAA

5 months ago

Good read: "GPU Cache Hierarchy: Understanding L1, L2, and VRAM" https://t.co/8FCGSRhLl9

6

876

134

791

47K

Gregory Hodges

@ghodges_dev

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users