The Korean translations of @burkov's <The Hundred-Page ML Book> and <The Hundred-Page LM Book> have just been released together! It was such a fun and rewarding experience to translate them. Big thanks to Andriy for these great books and to @insightbook for their careful work!
The Korean translation of @rasbt's <Build a Large Language Model (From Scratch)> is now available! 📖✨
I learned so much while translating this book. It offers a clear, hands-on journey into how LLMs are built and how they work. :-)
https://t.co/iA8CDtMai1
A bit late, but I’d like to share that the Korean translation of <Hands-On LLM> by @JayAlammar and @MaartenGr has been published. Working on the translation was truly enjoyable and filled with surprising discoveries! 😊
The Korean edition of Machine Learning Q & AI by @rasbt is now available! This book provides clear and insightful answers to key questions in machine learning and AI. :)
Check it out here 👉 https://t.co/WNGE9uJGWa
This paper https://t.co/zYNgPt1zmu is the complete recipe for pretraining a modern LLM from scratch, with all the details, source code, and source data. The follow-up paper will also provide the details of instruct-finetuning using Open Instruct https://t.co/5TITyUnWrX.
Introducing MobileDiffusion, a novel approach with the potential for rapid (sub-second) text-to-image generation on-device. An efficient latent diffusion model with a comparably small model size, it is well suited for mobile deployment. Learn more →https://t.co/JPmK7iR6T8
It's been an exciting week: the 'Machine Learning Q and A' book with @nostarch has been shipped to the printer and is now available for preorder!
If you've been searching for a resource following an introductory machine learning course, this might be the one. I'm covering 30 concepts that were slightly out of scope for the previous books and courses I've taught, and I've compiled them here in a concise question-and-answer format (including exercises).
I believe it will also serve as a useful companion for preparing for machine learning interviews.
The topics were selected from the entire breadth of machine learning subfields: Neural networks and deep learning, computer vision, natural language processing, production and deployment, and performance evaluation.
Here are just a few examples:
- Managing the various sources of randomness in neural network training.
- Differentiating between encoder and decoder architectures in large language models.
- Reducing overfitting through data and model modifications.
- Constructing confidence intervals for classifiers and optimizing models with limited labeled data.
- Choosing between different multi-GPU training paradigms and various types of generative AI models.
- Understanding performance metrics for natural language processing.
- Making sense of the inductive biases in vision transformers.
- And many more.
Note that this is not a coding book. However, I also have a supplementary GitHub repository with hands-on code examples for those chapters where it makes sense.
In most jurisdictions, copyright works the following way. Copyright belongs to the author by default. Authors aren't required to claim it explicitly. They just have it. Authors can explicitly provide third parties a certain right. For example, the author can allow the use of their content in certain use cases, including non-commercial and commercial use cases. For this, various licenses exist, such as Creative Commons.
Now, imagine you scraped the web to train your LLM. Most of the data you have in your training dataset doesn't come with a license. This means that by default you don't have a right to reproduce the content of your training data. LLMs and diffusion models are known to be able to reproduce their training data verbatim, fully or in part. In the absence of a permissive license, this is a clear violation of copyright. Currently, there's no reliable way of restricting the ability of LLMs to reproduce their training data, and it seems unlikely that it will be invented anytime soon.
A trained LLM (and almost any ML model) is a mathematical formula. In most jurisdictions, a mathematical formula cannot be subject to copyright. As a consequence, it doesn't matter what license was used when the model weights were put online. You can take the formula, modify it for your business needs, and use it, including any commercial context.
The Apache 2.0 licensed Mixtral beats proprietary GPT-3.5 Turbo, Gemini Pro, and the newest Claude 2.1. It would take just careful fine-tuning to reach GPT-4 level of performance. 2024 will be awesome!
Hugging Face 🫶 @GoogleColab
With the latest release of huggingface_hub, you don't need to manually log in anymore. Create a secret once and share it with every notebook you run. 🤗
pip install --upgrade huggingface_hub
Check it out!👇
OpenAI 샘 울트먼 강력 추천!
<신간> 스티븐 울프럼의 챗GPT 강의: 세상을 바꾼 챗GPT의 작동 원리부터 울프럼 알파 활용법까OpenAI 샘 울트먼 강력 추천!
<신간> 스티븐 울프럼의 챗GPT 강의: 세상을 바꾼 챗GPT의 작동 원리부터 울프럼 알파 활용법까지