"If we are honest — and scientists have to be — we must admit that religion is a jumble of false assertions, with no basis in reality. The very idea of God is a product of the human imagination. It is quite understandable why primitive people, who were so much more exposed to the overpowering forces of nature than we are today, should have personified these forces in fear and trembling. But nowadays, when we understand so many natural processes, we have no need for such solutions. I can't for the life of me see how the postulate of an Almighty God helps us in any way."
— Paul Dirac, Remarks made during the Fifth Solvay International Conference
TikZ is a powerful package to create graphic elements in LaTeX! We've written a article that introduces some basic concepts (drawing lines, dots, curves, circles, etc) to get you started. https://t.co/UhEdZ43kjP
New research from Databricks AI Research: FlashOptim cuts training memory by over 50% with no measurable loss in model quality.
Training a model with AdamW typically requires 16 bytes per parameter just for weights, gradients, and optimizer state. FlashOptim brings that down to 7 bytes, or 5 with gradient release. For Llama-3.1-8B finetuning, peak GPU memory drops from 175 GiB to 113 GiB.
Two techniques drive this: improved master weight splitting using tighter ULP-normalized error correction, and companded optimizer state quantization that reduces quantization error and improves convergence.
FlashOptim works as a drop-in replacement for SGD, AdamW, and Lion, supports distributed training with DDP and FSDP2, and is open source.
Paper: https://t.co/TBaRh90g8k
Source code: https://t.co/Dh6vOsSIbh