Fun fact: Did you know that quantizing your data into 4-bits using PyTorch’s torch.quint4x2 or bitsandbytes NF4 dtypes actually produces an 8-bit format? This is because it compacts every two 4-bit values into into a single 8-bit integer.
I just wrote an in-depth article outlining everything you need to know about quantization. It covers how 4-bit quantization with bitsandbytes Normal Float (NF4) works step by step, including the compacting process mentioned earlier.
This post aims to be a one-stop comprehensive guide, covering everything from quantizing LLMs to fine-tuning them with LoRa and breaking down QLoRa’s PEFT and bitsandbytes implementation.
It also explains in detail the LLM inference process with KV caching and decoding strategies, and much more. You'll learn a lot with follow-along code implementations from scratch and illustrations I created to drive concepts home.
Check it out : https://t.co/OzVH6fzUWP
"There is no peace for the man who has seen what he could become." - Epictetus
Once you know who you could be,
and you feel it in your bones....
You will never rest until you get there.
Just found out that Berkeley course staff are writing hooks inside course repos so if a student opens an assignment in Claude Code or Cursor the agent will automatically ping the staff 😵💫 well played
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
Two weeks without mobile internet improved mental health more than antidepressants and reversed roughly 10 years of attentional decline.
Screen time dropped 49% (314 to 161 min/day).
An intern on my team made a mistake. As a consequence I am resigning effective immediately.
Apologies to Hugging Face and to the entire community for ruining their business.
Not Americans but international teams developed convnets, alexnet, attention, AlphaGo, neural LMs, AlphaCode, AlphaFold, transformers, RL, etc, etc.
This war mongering CEO does not represent the Americans who helped develop AI either. They have greater ideals.
It is sad that my colleagues and I developed the science and tech being used by these money and power hungry despicable people.
We need a moratorium on AI weapons. And international institutions that can enforce it.
Y a 6 mois ils faisaient les grands traders de wall street et maintenant ils parlent à Claude toute la journée pour essayer de devenir développeurs web, le bear market c’est terrible mdr.
Not being able to appreciate silence and unstimulated time alone is the sign that you don't have much ongoing in your mind; otherwise, you would understand that these are gifts for your creative output.
I just officially released FlashSR, a model that can massively enhance audio quality at speeds over 200x realtime.
This was used in MiraTTS, but now anyone can use it for their own models.
Repo: https://t.co/cvzlqPwHIs
Model: https://t.co/173xS2VCFX