@norpadon If Pallas/Mosaic GPU doesn't have what you need, plgpu.inline_mgpu is your escape hatch.
Here is an example of how to use it to insert inline PTX https://t.co/AztFiQOQSV, or arbitrary MLIR code https://t.co/klNSTpggZu.
Please let us know if useful abstractions are missing!
Want to improve GPU compute/comms overlap? We just published a new short tutorial for you!
A few small changes to the Pallas:MGPU matmul kernel is all it takes to turn it into an all-gather collective matmul that overlaps NVLINK comms with local compute: https://t.co/HY4C3MwMb7
Curious how to write SOTA performance Blackwell matmul kernels using MGPU? We just published a short step-by-step tutorial: https://t.co/XRVX34juEz
At each step, we show exactly what (small) changes are necessary to refine the kernel and the final kernel is just under 150 lines.
@jeremyphoward We have some Pallas tutorials (https://t.co/7LRi5YQLuZ) and in particular a reference guide for Mosaic GPU (https://t.co/jw9lOrjAhA). It is of course far from being comprehensive, but we're working on it. If you have questions you should also just reach out and we'll help :)
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
... And now the thesis is online (https://t.co/lUlxbdCzuE)!
And the defense is scheduled next Wednesday (https://t.co/Q1tFqPx6S0) 😱
It'll in principle be streamed—feel free to DM if you're interested in attending :)
... And now the thesis is online (https://t.co/lUlxbdCzuE)!
And the defense is scheduled next Wednesday (https://t.co/Q1tFqPx6S0) 😱
It'll in principle be streamed—feel free to DM if you're interested in attending :)
@_xjdr@IlyasHairline@finbarrtimbers FWIW, we have the ability to use Triton to lower a lot of our code in XLA:GPU (and in fact, do it)!
I'm curious exactly what are the performance cliffs that you're facing, and whether we can help with that!
Many of you are excited about H100 attention, so it’s a good time to show you Mosaic GPU: a Python DSL for H100s.
The attention example matches FA3 performance, while being only ~200 lines of Python: https://t.co/12ecz3LftV
It's easy to install too! Latest JAX packages have it.
As much as I'm personally enjoying the AI ride at the moment, the attacks we are seeing are worryingly familiar web app attacks and one wonders if anyone in the AI world is aware of how you do web app security?
Case in point: https://t.co/fLRVnIUlLT
A very good paper firstly
Tired of “LLM hacking” hype with no code? Here’s a breath of fresh air. https://t.co/GAs2gtXz21
1. Challenges: open source ✅
2. Solution framework: open source ✅
If you’re interested in hackbots in offsec and you’re craving something you can RUN, you gotta read this