now we have successfully made our Softmax and it fits perfectly inside our VXM after our ALU operations
next step is to get int32 to int8 quantization
and as always we would love any feedback or questions ๐
@sakshambatraa and I have now implemented Softmax on our toy LPU!
after overhauling the VXM pipeline, we set our sights on implementing the Softmax module in hardware ๐งต
the winner is 4 Parallel Softmax's with LUT
by combining 4 lane parallel execution with our new LUT based math blocks, we brought the execution time down from 143 cycles to just 4 clock cycles for a full 4x4 matrix
our next major milestone is expanding the pipeline to fully support the attention formula by integrating the final two stages:
Softmax: for normalizing the attention scores Quantization: to downscale the precision on the fly
we appreciate all comments / feedback :)
another progress update on reinventing Groq's LPU with @sakshambatraa:
we redesigned out vector execution module (VXM) to better support overlap on operations, and introduce compatibility to run self attention!
but to add flexibility in the future, we may change the scale from hardcoded to a look up table this LUT value will come from our bias mem line and flow like this