@Mayhem4Markets@TheAhmadOsman@nvidia@NVIDIAAI If needed I can put my auto researcher on it but im using up all my compute on my new 30x kv cache and im training Gemma 4 250B on my gpus.
@Mayhem4Markets@TheAhmadOsman@nvidia@NVIDIAAI I should not get distracted but should I grow nemotron 500B to 600B to get their attention on my two gpus? would be faster if they gave us kernels
@willdepue its not bad, but there are much faster ways to do that
Mine is a different approach but much faster result. Im in my last round of new optimization testing, about to replace Adam optimizer for a 3x speed
https://t.co/3200r2RGHQ
Hi all today I release https://t.co/M3dZpqX6Kz
It was made with a method I call MXAR.
This is a 13x speed up in pre/post training. No LORA, just regular full weights modification.
Also update, current KV cache replacement sits at 72X compression. Soon.