We are proud to sponsor LSX World Congress, taking place on 28-30 April 2025. Join us to access strategic knowledge and form new partnerships. #LSXWorld https://t.co/RPv9I6iwxa
“Finetuning excels in its ability to adapt an LLM’s behaviour to specific nuances, tones, or terminologies. If we want the model to sound more like a medical professional, write in a poetic style, or use the jargon of a specific industry…” — @HeikoHotz https://t.co/UjhMBTPsyl
How can you solve complex tasks using a Large Language Model?
Here is a 2-minute introduction to everything you need to know to 10x the quality of your results.
Let's talk about three techniques, in order of complexity, starting with the easiest one:
• In-Context Learning
• Indexing + In-Context Learning
• Fine-tuning
In-Context Learning
The team that trained GPT-3 found something they couldn't explain: You can condition a model using examples of how you want it to behave.
I included an example prompt in the attached video.
You can "teach" the model how you want it to interpret questions, select the correct answers, and format the results by giving a few examples.
You can also give specific knowledge to the model that will be helpful when formulating answers. We call this approach "grounding the model." There's another example in the video.
Indexing + In-Context Learning
Unfortunately, there is a limit to how much data you can include in a prompt. We call this the "context size."
One version of GPT-4 supports a context of approximately 6,000 words, while the other supports 25,000 words.
Although this sounds like a lot, many applications need more than that.
Imagine you wrote a book and want to build an application to answer any questions about your story. What happens if your book is longer than the context?
That's where Indexing comes in.
Using a model, you can turn every book passage into an embedding. These are vectors, numbers that "encode" the passage's text. You can then store these embeddings in a particular database that supports fast retrieval of these vectors.
You can then turn any question into an embedding and search the database for the list of passages that are similar to that query. Instead of using the entire book to ask the model, you can now use the relevant passages as in-context information, effectively working around the context size limitation.
Fine-tuning
Fine-tuning can give you an extra boost to get reliable outputs from your LLM. It is, however, the most complex approach on the list.
There are different approaches to fine-tuning a model with your data.
A popular technique is to process your data with your LLM and use the outputs to train a new classifier that solves your specific task. Notice that here you aren't modifying the LLM. Instead, you are chaining it with your trained classifier.
Another approach is to modify the parameters of the LLM using your data. Think of this as "rewiring" the model in a way that solves your particular task. The results and costs will vary depending on how many layers you want to fine-tune from the original model.
Many companies think that fine-tuning is the solution to their problems. In my experience, many will benefit from exploring the other two approaches.
I love explaining Machine Learning and Artificial Intelligence ideas. If you enjoy in-depth content like this, follow me @svpino so you don't miss what comes next.
How GPT3 works. A visual thread.
A trained language model generates text.
We can optionally pass it some text as input, which influences its output.
The output is generated from what the model "learned" during its training period where it scanned vast amounts of text.
1/n
Open-source ML is at it again!
Databricks just released Dolly 2.0!
Here's what you need to know:
- This model is a 12B parameter language model based on EleutherAI Pythia model family.
- It's fine-tuned on 15K high-quality human-generated prompt/response pairs (crowdsourced among Databricks employees) for instruction tuning LLMs.
- Dolly 2.0 is open-sourced, including training code, dataset, and model weights.
- The best part is that it's suitable for commercial use! This is one of the big limitations of previous instruction-following models like Alpaca, Koala, GPT4All, and Vicuna.
Model weights: https://t.co/yUj5XKCdVU
Dataset: https://t.co/DgI70balUp
Blog: https://t.co/9tPgynbkR7
Study Deep Learning for Free from MIT
MIT's introductory course on deep learning methods with applications in computer vision, language, and more!
Course Link: https://t.co/6sMlJS1Pc8
Computer vision is coming back into the forefront with Stable Diffusion.
But if you're totally new to CV, you've gotta get started somewhere.
No matter your skill level, here’s my favorite computer vision course.
(And, of course, it’s 100% free from University of Michigan!)
Mastodon has just passed over 2 million active monthly users, a new record! People are voting with their feet. The future of social media doesn't have to belong to a billionaire, it can be in the hands of its users.
Excited to share that my day-long workshop (a short course) on #ExplainableAI is now publicly available as a five-part youtube video lecture series.
Link to video lectures: https://t.co/n0pKBbGByw
Link to slides: https://t.co/UzUQeCXUox
#AI#ML@trustworthy_ml@XAI_Research
@VueHelp Folks, the movie got over long back ... Much before you replied. I was not able to connect to a human on your support line because your speech recognition engine didn't recognize the movie name I was saying. I remember repeating atleast 30 times.