@Yxuer@Peter_shirley As a baseline, using NVIDIA CUDA backend for Vulkan Raytracing, at 2560x1440 (8 samples, 16 bounces) on a GTX 1080 Ti:
3.4 FPS (0.29s per frame)
There are probably some nice low hanging performance fruits to be tackled for your fun project.
@DefPriPub@Peter_shirley Also, conceptually there is no reason for `final` to go faster, as all the method calls in RTIOW are on the base pointer and virtual.
Only if the pointer is on the final class, will the compiler infer that the method call is direct and non-virtual.
@DefPriPub If you really mean business, using the CPU, then ISPC beats GCC and Clang by a healthy margin. It has a similar paradigm to GPU programming languages.
Example: https://t.co/IDDLdK0rOA
`final` is just noise compared to any of these aforementioned approaches.
@DefPriPub I've found the following to yield the best results on a i9 9900K for RT:
g++ -O3 -ffast-math -march=skylake
FastMath allows SSE/AVX instructions instead of slow IEEE-compliant routines.
MArch Skylake implies the use of FMA instructions.
Overall goes more than 2x faster.
@ID_AA_Carmack The real question is how come memory bandwidth utilization is not reported as a base metric by all the common OSes, like they do for CPU & Disk IO. Given that all algorithms are either CPU, memory or IO bound, it seems like a rather unfortunate blind spot to have.
@d0cTB@PayPalFrance Pas que je sois expert légal, mais normalement GDPR couvre ce cas et te donne un sacré poids juridique.
Wikipedia: "right to contest any automated decision-making that was made on a solely algorithmic basis, and their right to file complaints with a Data Protection Authority"
@d0cTB Single channel?
Bandwidth estimates look low in general. I assume Memtest86+ is using BCOPY convention (https://t.co/RHvp44x0e1)?
Might be good to clarify in the UI.
Personally prefer the Hardware convention, as it's closer to the memory official specifications.
@EyezCG I don't think it should take 6h to render at that resolution, even single threaded.
Suggest you check this C++ version as a performance baseline: https://t.co/RlgCqvOfnV
Then the ISPC version is the best I managed on the CPU, and serves as a good basis for a pure CUDA version
@wkjarosz This is likely not what you want to hear but: don't use static initialisation. Use explicit object creation and registration within your main() method. 30 years of Computer Science will thank you (so will your concurrency code, unit tests, and debugging tools).
@IanCutress Care to at least give the URL? I've checked a few articles but couldn't easily find the explanation.
Worth asking the license owner, no?
"Most benchmarks are not open source": this guy would seriously disagree with you -> https://t.co/DoAUQkIynN
@IanCutress What's stopping you from open-sourcing it? Seriously?
It's your student project. Less credibility than a big name benchmark. Getting >5x speed up when using AVX512 compared to AVX2 is not expected just from doubling the register size. Something else is at play here
@IanCutress@IanCutress Let's be honest: no source, student project, patched by Intel, and very surprising results. It requires a bit more transparency.
You have gotten us used to better journalistic standards than that.
@IanCutress "we also have a fully optimized AVX2/AVX512 version, which uses intrinsics to get the best performance out of the software. This was done by a former Intel AVX-512 engineer who now works elsewhere"
https://t.co/SXKknMCEI2
Implies strong possibility of bias