@FelixCLC_ There are some other things to consider, for example FP is usually 3R1W and integer 2R1W. But afaik 3R1W is usually implemented with port stealing or late forwarding for FMA to reduce the actual port count.
I feel like overlapping FP and integer physical register file and ports may be better than overlapping FP with SIMD.
Because scalar FP needs higher issue and lower latency than SIMD FP.
@FelixCLC_ Overlapping FP and INT also gives you have more control over FP32 and FP64. Maybe you put FP32 on all ALUs, but FP64 only on the ones that also support IMUL (if you can share that logic).
@FelixCLC_
I tested the impact of disabling RVC on the SpacemiT X100 4-wide OoO core with clang.
Enabling RVC resulted in a roughly 10% performance improvement.
Not sure what this actually tells us about RVC exactly, but it's certainly interesting.
@FelixCLC_ Since it's a modified OpenC910, it should have a 64 KiB 4-way associative L1I, with 4-bit of pre-decode for every 16-bit slot. Two of which indicate size, though idk why that duplicates the prefix. You would only need 2-bit for every 32-bit slot in an implementation without RVC.
@FelixCLC_ The instruction pack stage, which resolves the ICache ways and does further pre-decoding before 16-bit parcels + metadata get written to the instruction queue. But doesn't do the alignment.
https://t.co/blLIGFXz2G
@FelixCLC_ The C910 C&C article suggests, decode cracking and alignment happens in one decode pipeline stage. The X100 doesn't need to crack anything at high performance (LMUL is handled later and it doesn't support xthead), so I assume they rewrote that part with some fusion logic instead.
The L1I bandwidth of the X100 is 16-bytes / cycle, so it should be able to feed the 4-wide core without compressed instruction.
While the X100 supports a handful of fusion pairs, those aren't compressed only. (bitwise+bitwise, mul+add, add+load/store, slli+sr*i)
@geofflangdale@InstLatX64 sha3 in particular is probably a bad example, because the algorithm was exlicitly designed for 32 GPRs.
But yeah, excited for APX.