@farzyness The co-sponsor of this bill, my congressional representative, has openly stated he wants to "ethnically cleanse" me from his district. Why should I believe he's not going to find some way to use this to deny me my vote? Why should I support *anything* he wants to do?
@JHochderffer@ylecun seems a bit self-important to assume no being in the universe could ever have a more general intelligence than what humans have right now. And if "human level intelligence" isn't the limit, why should we treat it like it is?
What’s the easiest way to specialize an LLM over your own data? Recent research has studied this problem in depth, and RAG is way more effective (and easier to implement) compared to extended pretraining or finetuning…
Knowledge from pretraining. A lot of factual information is inherently present within an LLM’s pretrained weights, but the knowledge possessed by these models is highly dependent upon the characteristics of their pretraining data. Unfortunately, this means that—at least in the current paradigm of LLMs—the knowledge base of these models is static (e.g., ChatGPT has a knowledge cutoff date) and may lack detailed information.
Knowledge injection. Given a pretrained LLM, there are two postprocessing techniques that we can use for injecting new data into the LLM’s knowledge base:
- Finetuning: continuing the model’s pretraining process over a smaller, domain-specialized corpus of new information.
- Retrieval Augmented Generation (RAG): modifying the LLM’s input query by retrieving relevant information that can be leveraged by the model via in-context learning to generate a more grounded/factual output.
The variant of finetuning referenced above is a continued pretraining style of finetuning, where a next token prediction objective is used to further train a pretrained model over a specialized corpus of text. In contrast, SFT and RLHF emphasize the quality of model responses rather than improving the LLM’s breadth of knowledge.
“Given some knowledge base in the form of a text corpus, what is the best way to teach a pre-trained model this knowledge?” - from [1]
Recent research. In [1], authors compare RAG and finetuning to determine the superior knowledge injection approach. The RAG setup uses vector search to retrieve relevant document chunks to include in the model’s prompt. Given a corpus of information, we can:
1. Divide this corpus into chunks of text.
2. Use an embedding model (e.g., bge-large-en) to generate a dense vector for each chunk of text.
3. Search for relevant chunks by embedding the model’s input and performing a vector search.
4. Add relevant chunk’s into the model’s prompt.
What do we learn? While finetuning does improve model performance, RAG consistently outperforms finetuning for the injection of both new and previously encountered knowledge. Put simply, LLMs struggle to learn new information through finetuning. Though finetuning does yield a benefit in performance relative to the base model, RAG has a significant advantage over finetuning. Combining RAG with finetuning—though effective in some cases—does not consistently benefit performance.
Finetuning with paraphrases. We can improve the performance of finetuning for knowledge injection by training the model over several different paraphrases of the same information. In order to teach an LLM new information via finetuning, we must repeat this information in numerous ways.
——
[1] Ovadia, Oded, et al. "Fine-tuning or retrieval? comparing knowledge injection in llms." arXiv preprint arXiv:2312.05934 (2023).
APPLICATIONS ARE OPEN!
Are you passionate about solving problems using technology? Apply for a Master of Science in Information Technology, Electrical and Computer Engineering, and Engineering Artificial Intelligence today.
Apply here:
https://t.co/UltEhJzSWG
#ApplytoCMUAfrica
@garrytan@DH_PlantVillage@IFPRI collaborated with @plantvillage to develop an ai-powered app to study nutrition in adolescents. The goal was to build an app that could compete with dedicated nutritionists. The results are impressive
https://t.co/GgqeV0EMu2
@Noahpinion@BigJohn2310 Then why are you using homeownership as an indicator of how millennials are doing in the housing market relative to past generations? It tells us nothing. Anybody can be well off if debt doesn't count.
@TheGameLooters@SuperSaf corporations don't care about fixing problems, they care about *being seen* fixing problems. the former is expensive, so if they can get away with only doing the latter, they will.
@liron@HannahFrankman If you do the math, 40k over three years comes to about 35 a day. And I kinda don't believe someone sat and counted forty thousand questions for multiple kids, so I'm guessing, if this number isn't just made up, someone probably just took an average and expanded it to 3 years.
@drgurner@HannahFrankman@baconsheikh Did you come across any other studies on this? That's the only one I could find. Everything else was just "studies show" fluff articles.
@CarolinAramburo@manxbenji@HannahFrankman Those are books, not studies. It's impossible to have a discussion about data I can't actually look at. I can only find one study, and, skimming through it, it seems incredibly irresponsible to interpret it as the original tweet has.