We're releasing DASH (Distributed Accelerated Shampoo), an improved implementation of the Shampoo optimizer that achieves up to 4.83Γ faster optimizer steps, while matching or improving final model quality.
[1/6]
@asmah2107 Hi! I'm currently a PhD student working in optimizers for deep learning. I'm focused on decreasing the memory usage and speeding up the runtime.
@iamgrigorev This is a great tip for productivity! Please check out our GridSearcher project, it is completely written in Python and allows you to run jobs in parallel or sequentially using dictionaries for hyper-parameters by employing a basic scheduling: https://t.co/t4dUUuxs5g
Our QuEST paper was selected for Oral Presentation at ICLR @sparseLLMs workshop!
QuEST is the first algorithm with Pareto-optimal LLM training for 4bit weights/activations, and can even train accurate 1-bit LLMs.
Paper: https://t.co/ulbX5D5LjD
Code: https://t.co/OYfjs4jdDp
We're releasing the DASLab GGUF Quantization Toolkit! π
First open-source toolkit bringing GPTQ + EvoPress to @ggerganov's GGUF format, enabling heterogeneous quantization based on importance.
Result: Better models at the same file size.
[1/5]
π We are releasing state-of-the-art post-training quantization (PTQ) algorithms for Microscaling FP4, together with kernels:
- First study focused on MXFP4/NVFP4 PTQ for LLMs
- New Micro-Rotated (MR) format and GPTQ algorithm
- QuTLASS GPU kernels with up to 3.6x speedups.
@KwangjunA This seems to be similar to our recently shared work, where we use Discrete Cosine Transform (DCT) to perform a cheap low-rank projection of the momentum buffer, followed by Newton-Schulz orthogonalization. Check out our paper here: https://t.co/Sao5sBDDTu
Paper: https://t.co/9jwxD1tU92
We release vLLM & HuggingFace integrations.
Code: https://t.co/pvkXTyDlcO
Kernels: https://t.co/KjO04HAHyq
Credit goes to the team: @RobertoL_Castro, Vage, Denis, @black_samorez and @AshkboosSaleh, with support from @RedHat_AI and @thoefler
Introducing Panza, a personalized LLM email assistant, running entirely on-device!
[1/6]
* Panza adapts LLaMA-3-8B to match your unique writing style;
* Can be fine-tuned and executed on a single GPU (free Colab version available!).
Give it a try:
https://t.co/JjyYT9Mrun