we've been pushing commits to transformers discretely, time to talk about we've been cooking the last few months:
⚡️ Continuous Batching is in transformers ⚡️
this will simplify, most notably, evaluation and your training loop: no need for extra dependencies or infra to get fast inference, and no need for convoluted code to update your weights
note that speed is currently not on par with the best inference frameworks and servers out there and probably never will be
the goal is *not* to become as fast: we want to complement the existing landscape with features like these, aiming for transformers to be the toolbox for tinkering with and building models
Anyone interested in a CUDA deep dive that makes your workload 25% faster? 🧐
Just published a new blog post on asynchronous CPU / GPU inference: 100% insight, zero slop 😊
To learn how to remove all CPU overhead and use your GPU to the max, just read it 🔥
Reading @deepseek_ai 's v4 paper.... absolute hats off.
Every problem has a mathematical solution, nothing is left to chance.
I have so much respect for them, putting out months or years of efforts entirely for free, in the open for anyone to benefit. Real goats 🫡
This marks the end of my first week at @huggingface! I'm joining as a founding engineer on HF's PyTorch team.
My first project: safetensors on Mac is up to 3x faster🚀
Parallel reads straight into MPS unified memory, no CPU staging.
MB Pro M5 Pro
- Cold 16 GB: **2.97 → 8.23 GB/s** (2.8×)
- Warm 3 GB: **10.3 → 26.6 GB/s** (2.6×)
We're opening a Hugging Face office in Tokyo!
Our goal: help open-source AI develop in Japan and grow the local community. Let's meet!
ハギングフェイスの東京オフィスがオープンしました!
私たちの目標は、日本におけるオープンソースAIの発展を支援し、ローカルコミュニティを育てることです。ぜひお会いしましょう!
First release of safetensors under the PyTorch Foundation umbrella! 0.8.0-rc.0 is out:
- GIL-free serialization
- Windows ARM64 wheels
- AMD FP8 FNUZ support
- little perf improvements here and there ✨
Would appreciate feedback if you feel inclined 🥹
Big moment for open source ML: safetensors is joining the PyTorch foundation!
This means first class citizen support for safetensors in PyTorch’s core library, amongst other things 🥹
Super proud of being a maintainer in such an essential tool for ML 🫡
I seem to have found somewhat of a sweet spot. Talk into Claude for the ideation phase, write down the plan, and do everything by hand myself, apart from tests maybe, who likes writing test amiright
I question / rework / ignore everything written in plan as it often misses the target, but it does help me think through the problem in great detail. I go from one big plan to smaller in depth plans for each substep which works quite nicely.
Co-ideating with Claude keeps the fun alive imo, so long you ask it to tweak / give feedback on your original ideas and have vision for what you want to do. It kind of feels like pair programming!
Programming was deeply satisfying work to me. Work for hours/days before getting the payoff of the code working well on your machine. I’m feeling so much friction now to open the editor and do this kind of task by hand, but also increasingly depressed with the nature of work in an AI assisted dev workflow. Back and forth prompting seems to eat at my soul. Need to find a balance that brings back some of the toil.
Transformers v5's FINAL, stable release is out 🔥 Transformers' biggest release.
The big Ws of this release:
- Performance, especially for MoE (6x-11x speedups)
- No more slow/fast tokenizers -> way simpler API, explicit backends, better performance
- dynamic weight loading: way faster, and enabling: MoE now working w/ {quants, tp, peft, ...}
We have a migration guide on the main branch; please take a look at it in case you run into issues. Come in our GH issues if you still do after reading it 😀