A tiny implementation detail in low-precision arithmetic could be biasing your AI training 😲
This interactive deep dive from Graphcore Research's @Awfidius uncovers a subtle failure mode in stochastic rounding that only appears when randomness is limited, and how it can be addressed with one simple fix 🎲
Check it out in the link below! 👇
A tiny implementation detail in low-precision arithmetic could be biasing your AI training 😲
This interactive deep dive from Graphcore Research's @Awfidius uncovers a subtle failure mode in stochastic rounding that only appears when randomness is limited, and how it can be addressed with one simple fix 🎲
Check it out in the link below! 👇
🚨 Graphcore is hiring AI Research Interns! 🚨
Join us to work at the intersection of hardware and AI and help shape the future of AI systems. Whether you're excited about efficient inference, large-scale training, or advancing frontier-model capabilities, we’ve got cutting-edge projects for you to dive into.
Interested? Apply below 👇
Our Papers of the Month for September is now live! We cover:
- LLM self-correction via RL
- Trillion-token FP8 training
- SOAP (Shampoo + Adam)
- Generative models for crystals
All framed in terms of "proper conditioning" (🧵)
https://t.co/X6Xllf0SdC
Graphcore Research internships are now open 🎉
We're looking for PhD students for next summer
We're interested in algorithms & tools for hardware-efficient ML, in areas like LLM training/inference, GNNs, knowledge graphs and frameworks
Spread the word! https://t.co/3AsDzaSney
Introducing `tandv` - a library for tracking and visualising the internal stats of your model. We hope this will help with low-precision, debugging and more.
(link in 🧵)
Same problem, and based on online searches, many others have it too.
Terrible wasteful design, saves 10 seconds when it works, costs hours when it doesn't, and a load of extra useless plastic is left in place having saved those 10 seconds.
As further feedback to the team that thought this would be a great innovation that would attract customers: I bought this tap because of Grohe's reputation for quality. The term "quickfix" had zero impact on my choice.
However, that term now means "low quality gimmick".
Our u-µP paper hit arXiv this morning! I'm so proud of this one — and grateful for a wonderful team who put so much into it 🥰
We add lots of good things to µP. Better sweeping, transfer, simple FP8. Already @cloneofsimo has a great thread on it, which I highly recommend
Excited to present our work at @icmlconf WANT workshop: ⚖️ Scalify: scale propagation for efficient low-precision LLM training 🎉
https://t.co/7BnCQ1Vpqk
Our team's summaries & analysis of our favourite papers from the last month.
We give our take on: Mamba-2, sparse-µP, contextual position encoding & matmul-free models 🧵
https://t.co/Ad7HRGLSaf
Our latest edition of *Papers of the Month* is now available 📚
These are summaries of our team's favourite papers from March, including a new low-rank training procedure GaLore, and the supposed "Era of 1-bit LLMs" (really 1.58 bits)
Mini-version in 🧵
https://t.co/8NzkK0CGyB
I'll start with "Julia meets the Intelligence Processing Unit" (https://t.co/7skWAUHXAf), about running #JuliaLang on the @graphcoreai IPU, spectacular example of how @llvmorg enables adopting new cool hardware in Julia, and how Julia opens up new doors in scientific computing