Want to improve GPU compute/comms overlap? We just published a new short tutorial for you!
A few small changes to the Pallas:MGPU matmul kernel is all it takes to turn it into an all-gather collective matmul that overlaps NVLINK comms with local compute: https://t.co/HY4C3MwMb7
Curious how to write SOTA performance Blackwell matmul kernels using MGPU? We just published a short step-by-step tutorial: https://t.co/XRVX34juEz
At each step, we show exactly what (small) changes are necessary to refine the kernel and the final kernel is just under 150 lines.
I've finally landed my first proper JAX feature since joining the team: a supported "foreign function interface", which makes it easier to call into external libraries from within JAX code. Check it out: https://t.co/8ilDuW1IYI
Many of you are excited about H100 attention, so it’s a good time to show you Mosaic GPU: a Python DSL for H100s.
The attention example matches FA3 performance, while being only ~200 lines of Python: https://t.co/12ecz3LftV
It's easy to install too! Latest JAX packages have it.
Our team is looking for a strong research engineer, hardware background is *not* obligatory. Please, share and recommend someone!
https://t.co/AGrF7rwRzZ
@mitsuhiko I think a lot of the ergonomics were sacrificed to the idea of annotations being used for things other than types. So, most typing features are designed to abuse existing syntax to the benefit of nobody.
Are you a PhD student interested in the interface between generative AI, LLMs and audio? Our team at Google behind AudioLM, MusicLM and AudioPaLM is looking for a talented student researcher!
See details and apply https://t.co/qFY95633P6 and send your cv to [email protected].
@yminsky The Python type system has no spec, so I can guarantee there will be plenty of things they disagree about.
Pyright is also usually faster to adopt new type system features, so it's possible to get type errors simply because mypy doesn't fully support some feature (yet).