Haven't had time to explore Mojo 1.0 beta yet? @InfoWorld published a piece on Mojo 1.0 that will get you up to speed on language basics, metaprogramming, Python interop, GPU support, and more:
https://t.co/NGtpTd4FHz
@Modular I learned a lot from sitting down with Kyle and hearing his guided tour of inference serving. I'm glad we can share this with the MAX community!
Zero GPU experience. Outperformed & improved @UnslothAI's CUDA kernel😱 @davidrobertson on what happens when you don't have to fight the language to get performance🔥
https://t.co/7q0sGxI6nY
🔥 New Series! Learning GPU programming through Mojo puzzles - on an Apple M4! No expensive data center GPUs needed. No CUDA C++ complexity. Just Python-like syntax with systems performance.
First video just dropped: https://t.co/MI0BzfBuL0
#Mojo#GPUProgramming#AppleSilicon
7th edition of the #MLIR workshop alongside the #LLVM conference, another great lineup of talks: impressive work all around!
127 people in the audience, nice turnaround :)
Here #Mojo GPU programming.
We raised $250M to accelerate building AI's unified compute layer! 🔥 We’re now powering trillions of tokens, making AI workloads 4x faster 🚀 and 2.5x cheaper ⬇️ for our customers, and welcomed 10K’s of new developers 👩🏼💻. We're excited for the future!
Part 4 of "Matrix Multiplication on Blackwell" is here! It continues our epic journey of describing how Modular implemented the fastest B200 matmul in the industry, revealing the techniques to achieve 1772 TFLOPs, exceeding that of the current SOTA.
https://t.co/jhBeJBvmuc
Triton is nice if you want to get something onto a GPU but don't need full performance/TCO. However, if you want peak perf or other HW, then Mojo🔥 could be a better fit. I'm glad OpenAI folk are acknowledging this publicly, but I wrote about it here:
https://t.co/dzGlAUapOY
Part 3 of "Matrix Multiplication on Blackwell" is here! It continues our epic journey of describing how Modular implemented the fastest B200 matmul in the industry, revealing the techniques to go from 16% to 85% of SOTA
https://t.co/W8iahcB5Ct
Learn how @inworld_ai teamed up with @Modular and @OracleCloud to unlock hardware optionality across Nvidia and AMD GPUs for their groundbreaking TTS models, cutting total cost of model ownership by 70%.