GPSnoopy

@TFautre

Incompetent C++ and C# developer obsessed with performance.

Joined May 2021

6 Following

4 Followers

20 Posts

GPSnoopy @TFautre

about 2 years ago

@Yxuer @Peter_shirley As a baseline, using NVIDIA CUDA backend for Vulkan Raytracing, at 2560x1440 (8 samples, 16 bounces) on a GTX 1080 Ti: 3.4 FPS (0.29s per frame) There are probably some nice low hanging performance fruits to be tackled for your fun project.

1

1

0

0

54

GPSnoopy @TFautre

about 2 years ago

@DefPriPub @Peter_shirley Also, conceptually there is no reason for `final` to go faster, as all the method calls in RTIOW are on the base pointer and virtual. Only if the pointer is on the final class, will the compiler infer that the method call is direct and non-virtual.

0

2

0

0

307

GPSnoopy @TFautre

about 2 years ago

@DefPriPub If you really mean business, using the CPU, then ISPC beats GCC and Clang by a healthy margin. It has a similar paradigm to GPU programming languages. Example: https://t.co/IDDLdK0rOA `final` is just noise compared to any of these aforementioned approaches.

0

0

0

0

84

GPSnoopy @TFautre

about 2 years ago

@DefPriPub I've found the following to yield the best results on a i9 9900K for RT: g++ -O3 -ffast-math -march=skylake FastMath allows SSE/AVX instructions instead of slow IEEE-compliant routines. MArch Skylake implies the use of FMA instructions. Overall goes more than 2x faster.

1

2

0

0

462

GPSnoopy @TFautre

about 3 years ago

@KerbalSpaceP Congratulations on implementing lens flares as it ought to be done since 2007. https://t.co/Y347BWgGFC

0

0

0

0

265

GPSnoopy @TFautre

over 3 years ago

@ID_AA_Carmack The real question is how come memory bandwidth utilization is not reported as a base metric by all the common OSes, like they do for CPU & Disk IO. Given that all algorithms are either CPU, memory or IO bound, it seems like a rather unfortunate blind spot to have.

1

0

0

0

118

GPSnoopy @TFautre

over 3 years ago

@d0cTB @PayPalFrance Pas que je sois expert légal, mais normalement GDPR couvre ce cas et te donne un sacré poids juridique. Wikipedia: "right to contest any automated decision-making that was made on a solely algorithmic basis, and their right to file complaints with a Data Protection Authority"

0

0

0

0

323

TFautre retweeted

over 3 years ago

flaviocopes's tweet photo. https://t.co/7XcGs3GUnj

63

3K

766

268

0

GPSnoopy @TFautre

about 4 years ago

@d0cTB Single channel? Bandwidth estimates look low in general. I assume Memtest86+ is using BCOPY convention (https://t.co/RHvp44x0e1)? Might be good to clarify in the UI. Personally prefer the Hardware convention, as it's closer to the memory official specifications.

0

0

0

0

0

GPSnoopy @TFautre

over 4 years ago

@EyezCG I don't think it should take 6h to render at that resolution, even single threaded. Suggest you check this C++ version as a performance baseline: https://t.co/RlgCqvOfnV Then the ISPC version is the best I managed on the CPU, and serves as a good basis for a pure CUDA version

0

1

0

0

0

GPSnoopy @TFautre

over 4 years ago

@TheCherno I couldn't resist comparing (5950X / 3090FE): - Code from the video : 22.0 seconds - C++ : 9.9 seconds [1] - ISPC : 2.8 seconds [1] - Vulkan : 0.005 seconds (200 fps) [2] [1] https://t.co/RlgCqvOfnV [2] https://t.co/NCp6WHzqD9

0

0

0

0

0

GPSnoopy @TFautre

almost 5 years ago

@wkjarosz This is likely not what you want to hear but: don't use static initialisation. Use explicit object creation and registration within your main() method. 30 years of Computer Science will thank you (so will your concurrency code, unit tests, and debugging tools).

0

0

0

0

0

GPSnoopy @TFautre

almost 5 years ago

@skaven_ @Peter_shirley Do you have a reproducible example that you can share? Seems hard to believe you otherwise

1

2

0

0

0

GPSnoopy @TFautre

almost 5 years ago

@IanCutress Care to at least give the URL? I've checked a few articles but couldn't easily find the explanation. Worth asking the license owner, no? "Most benchmarks are not open source": this guy would seriously disagree with you -> https://t.co/DoAUQkIynN

1

1

0

0

0

GPSnoopy @TFautre

almost 5 years ago

@IanCutress What's stopping you from open-sourcing it? Seriously? It's your student project. Less credibility than a big name benchmark. Getting >5x speed up when using AVX512 compared to AVX2 is not expected just from doubling the register size. Something else is at play here

1

1

0

0

0

GPSnoopy @TFautre

almost 5 years ago

@IanCutress @IanCutress Let's be honest: no source, student project, patched by Intel, and very surprising results. It requires a bit more transparency. You have gotten us used to better journalistic standards than that.

1

0

0

0

0

GPSnoopy @TFautre

almost 5 years ago

@IanCutress "we also have a fully optimized AVX2/AVX512 version, which uses intrinsics to get the best performance out of the software. This was done by a former Intel AVX-512 engineer who now works elsewhere" https://t.co/SXKknMCEI2 Implies strong possibility of bias

1

0

0

0

0

TFautre retweeted

SwiftOnSecurity

@SwiftOnSecurity

over 5 years ago

History in pics: Testing prototype Roomba's in 1982. It would take two decades until they could be made small enough to clean under a couch.

SwiftOnSecurity's tweet photo. History in pics: Testing prototype Roomba's in 1982. It would take two decades until they could be made small enough to clean under a couch. https://t.co/SsKSEF5tps

108

4K

628

43

0

GPSnoopy @TFautre

about 5 years ago

@damageboy @ridiculous_fish On a Ryzen 5950X (PBO Off - No overclocking) AVX2: u32: 7 1.231 0.547 0.551 0.108 0.108 2.013 2 u64: 7 1.845 0.440 0.461 0.274 0.280 2.449 2 Tested using Ubuntu 20.04 on Windows 10 WSL.

0

0

0

0

0

Last Seen Users on Sotwe

Trends for you

Most Popular Users