Writing the most absurdly, meticulously hand-optimized sprite and billboard rendering routine of my life for our #GTA 3 port to the Sega #Dreamcast. This baby gets used by all of GTA3's UIs, fog effects, screen-space effects, and ESPECIALLY the fake bloom-lighting effects you see blurring light sources in the scene...
First of all, I bumped the project up from C++17 to GNU++23, because I ain't fucking around here, and I need every dirty tool at my disposal from C and C++. Good, evil, or questionable language features? Don't care. Enabled!
Then what you're looking at here is the core routine for blasting sprites through the Dreamcast's fast path. The pvrVertexSubmit() lambda is responsible for translating one of the GTA3 engine's vertex types over to the Dreamcast's vertex type, and it do so by mapping a DC vertex directly into a "store queue" on our SH4 CPU, where we fill the thing in directly.
A store queue is a pretty damn interesting feature of the SuperHitachi architecture which allows us to copy data around extremely quickly by writing cache-line sized chunks of data (32-bytes) into it then issuing a cache flush on it to write its contexts extremely quickly through the cache and into some other region of memory... It's the fastest mechanism we have for memory transfers, faster than DMA, and we have two of them.
What happens here is that we've preconfigured the store queues' destination addresses to point to the PowerVR GPU's TA (Tile Accelerator), which is the unit that processes incoming vertices, storing them within vertex buffers in VRAM until it is time to draw the scene. pvr_dr_target() here is simply returning a pointer to ONE of the two store queues, then we translate gtaVert to pvrVert within the SQ, and finally do a call to pvr_dr_commit() to blast that fucker directly to the TA...
Something to keep in mind is that although we have two SQs, they cannot both be flushed and accessing the memory bus simultaneously, or the SH4 CPU's pipeline will stall while he first store queue finishing flushing.
What we do instead is alternate between writing to one while we flush another store queue. This essentially creates an efficient transfer pipeline where we are never stalled waiting for one store queue to finish! This is honestly the Dreamcast's equivalent of the Genesis's fabled "Blast Processing."
Now, the rest of the sprite routine is pretty standard crap, just looping over vertices, blasting them through the store queues. I have gone ahead and added C++20's [[likely]] and [[unlikely]] optimization hints on branches and loops, and yes, we have confirmed that they do actually impact performance on the SH4 architecture with GCC14!
The loops have also been unrolled to process up to three vertices at a time, prefetching each one into the cache while we're writing the previous one to the store queue... allowing us to easily push a shitton of these sprites.
#cplusplus #cpp #gamedev #retrocomputing
💣 BOOM 💣
8BitMods proudly presents our latest product that has been in development for over 2 years!
The MemCard PRO2 for the PS2 and PS1 is here!
Pre-orders are open NOW! Ships on or before December 2023!
🛒 | https://t.co/dO6KY6odMV
The quest, pun intended, to be more expressive in vrchat continues. Face tracking was great, but adding eye tracking has been a hit. It feels like some sort of social tipping point has been reached.
Behold, an awkwardly-recorded demo:
Pre-orders for the @analogue Duo open tomorrow, however it wasn't very clear to me that OpenFPGA support IS NOT INCLUDED, so I made sure to highlight this in the post: https://t.co/abnK3iMf45
Kiiinda tempted, but with the consoles I already have now paired with a retrotink 5x, or even misterfpga... it's not a 'need' like it used to, be back when first announced.