I spent the past few months building JAXformer: One of the first open source guides on how to scale modern transformers in JAX.
Trained entirely on TPUs, it supports distributed ML, Ray tokenization, MoE, n-D parallelism and end-to-end inference.
Hereโs how to do it:
tonight begins a school of foundation modelling. 60 students have been tasked with implementing gradient descent as their first assignment. really interested to see how this will progress. anyone who goes _all in_ on this will get so much value out of it.