This is huge: Llama-v2 is open source, with a license that authorizes commercial use!
This is going to change the landscape of the LLM market.
Llama-v2 is available on Microsoft Azure and will be available on AWS, Hugging Face and other providers
Pretrained and fine-tuned models are available with 7B, 13B and 70B parameters.
Llama-2 website: https://t.co/PKrrXgHdem
Llama-2 paper: https://t.co/aINNrXNhMb
A number of personalities from industry and academia have endorsed our open source approach: https://t.co/N7HwgW9Suh
@Tim_Dettmers Thanks a lot for the hard work on the CUDA kernels. I'm curious if batch_size=1 is optimal for this method, or if in the future it may be possible to speed up batched inference at 4 bit also?
We need new technical breakthroughs to steer and control AI systems much smarter than us.
Our new Superalignment team aims to solve this problem within 4 years, and weโre dedicating 20% of the compute we've secured to date towards this problem.
Join us! https://t.co/cfJMctmFNj
there is a huge difference between knowledge and expertise. individual humans are not so great with acquiring and retaining knowledge, but are remarkable at developing deep expertise. language models seem to be the exact opposite. they are amazing in the knowledge part.
@chrisalbon AFAIK only important for deep neural networks, so I would keep the card as a good lesson for everything else.
Would still love to see a card with an intuitive explanation for double descent
I think it's a horrible idea to ask for licenses to train models, this will reduce number of open-source models and will give big corporations an unfair competitive advantage to train closed-source models which will not be transparent at all and companies will have to sacrifice a lot of privacy since they don't own their models. Monopolies aren't good.
If a system affects people, people have the right to know more about it (biases and so on), given we know companies are using these models in all kinds of bad ways (they even use it for candidate screening ๐)
I get the desire to regulate but placing restrictions on open source orgs is not the right way. Whoever deploys the open source model should be responsible instead
"Any model made available in the EU, without first passing extensive, and expensive, licensing, would subject companies to massive fines of the greater of โฌ20,000,000 or 4% of worldwide revenue. Opensource developers, and hosting services such as GitHub... would be liable"
Oops haven't tweeted too much recently; I'm mostly watching with interest the open source LLM ecosystem experiencing early signs of a cambrian explosion. Roughly speaking the story as of now:
1. Pretraining LLM base models remains very expensive. Think: supercomputer + months.
2. But finetuning LLMs is turning out to be very cheap and effective due to recent PEFT (parameter efficient training) techniques that work surprisingly well, e.g. LoRA / LLaMA-Adapter, and other awesome work, e.g. low precision as in bitsandbytes library. Think: few GPUs + day, even for very large models.
3. Therefore, the cambrian explosion, which requires wide reach and a lot of experimentation, is quite tractable due to (2), but only conditioned on (1).
4. The de facto OG release of (1) was Facebook's sorry Meta's LLaMA release - a very well executed high quality series of models from 7B all the way to 65B, trained nice and long, correctly ignoring the "Chinchilla trap". But LLaMA weights are research-only, been locked down behind forms, but have also awkwardly leaked all over the place... it's a bit messy.
5. In absence of an available and permissive (1), (2) cannot fully proceed. So there are a number of efforts on (1), under the banner "LLaMA but actually open", with e.g. current models from @togethercompute, @MosaicML ~matching the performance of the smallest (7B) LLaMA model, and @AiEleuther , @StabilityAI nearby.
For now, things are moving along (e.g. see the 10 chat finetuned models released last ~week, and projects like llama.cpp and friends) but a bit awkwardly due to LLaMA weights being open but not really but still. And most interestingly, a lot of questions of intuition remain to be resolved, e.g. especially around how well finetuned model work in practice, even at smaller scales.