Day 24 of llm.c: we now do multi-GPU training, in bfloat16, with flash attention, directly in ~3000 lines of C/CUDA, and it is FAST! ๐
We're running ~7% faster than PyTorch nightly, with no asterisks, i.e. this baseline includes all modern & standard bells-and-whistles: mixed precision training, torch compile and flash attention, and manually padding vocab. (Previous comparisons included asterisks like *only inference, or *only fp32 etc.) Compared to the current PyTorch stable release 2.3.0, llm.c is actually ~46% faster. My point in these comparisons is just to say "llm.c is fast", not to cast any shade on PyTorch. It's really amazing that PyTorch trains this fast in a fully generic way, with ability to cook up and run ~arbitrary neural networks and run them on a ton of platforms. I see the goals and pros and cons of these two projects as different, even complementary. Actually I started llm.c with my upcoming education videos in mind, to explain what PyTorch does for you under the hood.
How we got here over the last ~1.5 weeks - added:
โ mixed precision training (bfloat16)
โ many kernel optimizations, including e.g. a FusedClassifier that (unlike current torch.compile) does not materialize the normalized logits.
โ flash attention (right now from cudnn)
โ Packed128 data structure that forces the A100 to utilize 128-bit load (LDG.128) and store (STS.128) instructions.
It's now also possible to train multi-GPU - added:
โ First version of multi-gpu training with MPI+NCCL
โ Profiling the full training run for NVIDIA Nsight Compute
โ PR for stage 1 of ZeRO (optimizer state sharding) merging imminently
We're still at "only" 3,000 lines of code of C/CUDA. It's getting a bit less simple, but still bit better than ~3 million. We also split off the fp32 code base into its own file, which will be pure CUDA kernels only (no cublas or cudnn or etc), and which I think would make a really nice endpoint of a CUDA course. You start with the gpt2.c pure CPU implementation, and see how fast you can make it by the end of the course on GPU, with kernels only and no dependencies.
Our goal now is to create a reliable, clean, tested, minimal, hardened and sufficiently optimized LLM stack that reproduces the GPT-2 miniseries of all model sizes, from 124M to 1.6B, directly in C/CUDA.
A lot more detail on: "State of the Union [May 3, 2024]"
https://t.co/eDgbngHrZ9
โค๏ธ ๐๐ ๐๐จ๐ฎ๐ ๐ก๐๐ฌ๐ญ ๐๐ง๐ญ๐๐ซ๐ฏ๐ข๐๐ฐ ๐๐ฎ๐๐ฌ๐ญ๐ข๐จ๐ง๐ฌ โค๏ธ
๐Mandatory Bookmark
[๐Bonus: PDF GiveAway for First 600 People ]
Follow @ahuja_priyank, Like & Comment [PDF]
โบ Thread [1/20]
I have cleared interviews at Adobe, Amazon, Google and Microsoft.
Here are the primary resources I used for coding, system design, low level design and behavioral interviews:
SQL for Data Science Complete Study Plan 2024๐
The timeline is 28 days, and you must dedicate at least 1.5 hours daily.
Week 1: Fundamentals of SQL
Day 1-3: Introduction to SQL syntax, SELECT statements, filtering, and sorting.
Resource: Khan Academy's "Intro to SQL" course on YouTube.
https://t.co/riGC6brrfs
Day 4-5: Working with multiple tables using JOIN operations.
Resource: DataCamp's "Joining Data in SQL" course.
https://t.co/cg9NGoIRWB
Day 6-7: Aggregating data with GROUP BY, HAVING clauses, and understanding subqueries.
Resource: Coursera's "SQL for Data Science" specialization.
https://t.co/DXWLNOd8sO
Week 2: Intermediate SQL Concepts
Day 8-10: Learning about table modifications (INSERT, UPDATE, DELETE) and working with NULL values.
Resource: YouTube playlist by Caleb Curry on "SQL Tutorials".
https://t.co/DCRqaYJ4I5
Day 11-12: Diving into data normalization and database design principles.
Resource: YouTube playlist - Basic Concept of Database Normalization
https://t.co/nd3HvUJFFa
Day 13-14: Introduction to window functions for advanced data manipulation.
Resource: SQL Tutorial - Window Functions by BeardedDev
https://t.co/SqKTZc6uFh
Weeks 3: Advanced SQL Techniques
Day 15-17: Mastering subqueries and correlated subqueries.
Resource: techTFQ's "Subquery in SQL" course.
https://t.co/0x59Ec0gkC
Day 18-20: Learning about indexes, performance optimization, and query tuning.
Resource: SQL performance tuning and query optimization
https://t.co/Wln4do5iEW
Day 21-22: Understanding stored procedures, user-defined functions, and triggers.Resource: https://t.co/nVygIXNTcp
Weeks 4: Real-world Applications and Practice
Day 23-24: Implementing data analysis tasks like data cleaning, transformation, and visualization using SQL.
Resource: https://t.co/5Jx3vpwSJ6
Day 25-26: Working with complex data types (JSON, XML) and geospatial data.
Resource: https://t.co/fH2EeQsetl
Day 27-28: Final project: Solving a complex data problem using SQL and presenting your findings.
Resource: Kaggle datasets with SQL-related challenges.
https://t.co/lUdJ1n00fe
https://t.co/VvBuIv0tsI
https://t.co/LY31xHKGQQ
Remember to adapt the pace according to your learning speed and comfort level.
Allocate time for hands-on practice after each video tutorial or lesson to reinforce your understanding.
Additionally, actively participate in relevant online quizzes and challenges
https://t.co/JCm6idzXgy
https://t.co/xatgNUTlq6
---
Don't forget to BOOKMARK ๐as you will definitely need it later!
Follow for more StudyPlans and Data Science Content!
RAG with LLMs seems deceptively simple but is extraordinarily hard to do well.
Building an intelligent ChatGPT-like tool with a custom knowledge base requires multiple non-trivial components.
A simple vector database for retrieval is rarely enough; you need a semantic understanding of the query and a full-scale search engine that powers the "retrieval."
At Abacus AI, we spend a lot of time thinking and automating the real-world use cases that our customers have, and while rewarding, it's pretty challenging ๐
We have just scratched the surface here, and a lot more can be done to create robust RAG-based ChatLLMs
Like every other machine learning problem, it's straightforward to get a prototype from a notebook template in a couple of hours, but then it takes months to pass all the evaluations and put the system in production. We do it in weeks!๐
We are actively working on this problem, and if you want us to help you with a free POC, drop us a line.
P.S. This image is from a survey paper that may be worth checking out; link in alt.
I've read a ton of research papers this year. And to conclude this eventful year in AI, I've compiled a selection of 10 noteworthy papers from 2023 that I am discussing in my new article: https://t.co/FfYERa23g0
To get a sneak peak, I'm covering:
- insights into LLM training runs
- new openly available LLMs
- efficient finetuning methods
- improving "small" LLMs
- pretraining on domain-specific data
- new techniques to align LLMs with human preferences
- enhancing LLMs with high-quality data and instruction sets
- ConvNet vs vision transformer comparisons
- recent developments in image segmentation
- video synthesis with latent diffusion models
I'm wishing you all a great start to 2024!
Huge Collection of Data Science Resources and Tools!
Being a Data Science professional in this Google Drive you can find everything you are expecting!
Download Link:
https://t.co/fn6bJN0zPJ
โค๏ธ๐๐
Follow @ZabihullahAtal it's all about Empowering you via Tech Updates, Tech Knowledge, and Insights.
Have you used transformers but not fully grasped how they work internally?๐
Welcome the Random Transformer, a step-by-step walkthrough doing the math of the transformer model. Kick off your year understanding what's going on under the hood.
https://t.co/XyFL8Ni1Of
[New lectures ๐ฅ] Transformers United - Stanford CS25
The new lectures of Transformers United class V3 have been realized(gradually). Current lectures are about foundational models and generalist agents.
Lecture videos: https://t.co/UIpWSY2dwZ
Website: https://t.co/8DUtIn60jg
Here are 300 hours of curated courses focused on Machine Learning Engineering.
There are 15 courses. From beginner to advanced. From Google. For free.
Some of the topics they cover:
โข Fundamentals of Machine Learning
โข Feature Engineering
โข Production Machine Learning Systems
โข Computer Vision and Natural Language
โข Recommendation Systems
โข MLOps
โข TensorFlow, Google Cloud, VertexAI
The courses are well structured. They aren't just links to YouTube videos. You have to join the course, and they have an interface that takes you through every module.
This is good content. And it's free.
https://t.co/eqTjRT6BZF
AI has transformed job hunting forever.
Use this Cheat Sheet to find your next dream role.
Sign up to Superhuman AI & get more AI resources like this for free.