I worked on this with @AdityaMakkar000 and @chinmayjindal_. Take a look at our blog to see how you can go from zero-to-one scaling a transformer from scratch.
GitHub: https://t.co/pngCWGjAj0
Blog: https://t.co/N2hmijqwXp
I spent the past few months building JAXformer: One of the first open source guides on how to scale modern transformers in JAX.
Trained entirely on TPUs, it supports distributed ML, Ray tokenization, MoE, n-D parallelism and end-to-end inference.
Hereโs how to do it: