Applied Machine Learning - Cornell CS5785
"Starting from the very basics, covering all of the most important ML algorithms and how to apply them in practice. Executable Jupyter notebooks (and as slides)". 80 videos!!
Videos: https://t.co/KGQBLQ37ou
Code: https://t.co/hTqxsLt6wu
🚨This week’s top AI/ML research papers:
- Mixture-of-Transformers
- BitNet a4.8
- LoRA vs Full Fine-tuning: An Illusion of Equivalence
- Mixtures of In-Context Learners
- Emergence of Hidden Capabilities
- DimensionX
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
- OpenCoder: The Open Cookbook for Top-Tier Code LLMs
- ReCapture
- Needle Threading
- M3DocRAG
- Controlling Language and Diffusion Models by Transporting Activations
- Why Do We Need Weight Decay in Modern Deep Learning?
- "Give Me BF16 or Give Me Death"? Trade-Offs in LLM Quantization
- Adaptive Caching for Faster Video Generation with Diffusion Transformers
- Constant Acceleration Flow
- Randomized Autoregressive Visual Generation
- Physics in Next-token Prediction
- In-Context LoRA for Diffusion Transformers
- Balancing Pipeline Parallelism with Vocabulary Parallelism
- EoRA: Eigenspace Low-Rank Approximation
- Self-Consistency Preference Optimization
- How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
- LASER: Attention with Exponential Transformation
- Photon: Federated LLM Pre-Training
- Attacking Vision-Language Computer Agents via Pop-ups
- Hunyuan-Large
- Context Parallelism for Scalable Million-Token Inference
- Stealing User Prompts from Mixture of Experts
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
- Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge
overview for each + authors' explanations
read this in thread mode for the best experience
for those of you scouting for "AI projects" to work on in your free time, i figure i would share the list of projects im currently doing to get a sense of how i carefully pick out problems:
1. write a training run in CUDA for some neural net you find cool. i started off w/ a single hidden layer MLP and gave myself a template to build from there
2. tinker w/ minecraft AI agents. multiple parts to this which make it even more fun (yes, you will get rapid dopamine hits while building). first you have to get an environment working (pixel output + action input) which i highly recommend "minerl" for . then you have to collect a bunch of training data in the form of minecraft video clips with actions each frame of the video (either your own or a massive corpus online -> also comes with minerl). next, you train the first neural net to take in a sequence of images from a video and you try to predict all the actions done in a given frame (say 7 frames total where the one you are trying to predict actions for is the middle frame -> temporal dimension helps performance). after this smaller neural net knows how to generate action labels from any given minecraft video, you can scrape thousands of hours of gameplay from youtube and run your small "data generator" neural net in inference mode to get yourself a nice dataset. then you test and tinker with different neural net architectures to actually attempt to reach goals in minecraft. you could use only neural nets, or neural nets w/ a mix of fixed algos (state machines, conditionals).
3. mechanistic interpretability on computer vision networks and small language models. i personally started off by training an MLP from scratch on the mnist dataset up to around 98% accuracy then used matplotlib to print out the weights and activations in certain ways. helps you understand how neural nets form patterns internally in order to predict labels for the data you train it on. then you could use libraries like transformer_lens to visualize the attention heads at each layer for any given prompt in llms like gpt2-medium / small. if you're gonna go beyond that just looking at the raw patterns with your own eyes, consider playing with sparse autoencoders (if you find the resources hard to navigate just shoot me a dm). they essentially take a bunch of dense values in activations and project it to a sparse tensor format so you can map sparse signals to features (keywords you'll find here are: superposition/polysemanticity & dictionary learning).
4. fire up an instance with at least 2 3090s or 4090s and try to train a neural net of your choice across them with pytorch DDP (data distributed parallel) to give yourself an intro to distributed training/inference.
5. or if these sound crazy and you want a surprise, try implementing a neural net paper from arxiv in pytorch (a paper on "differential attention" came out recently which i'd like to mess around with. you could too)
6. if you think some resources need to be explained or documented better and you see the value in doing so, consider making a tutorial and posting it on X or youtube (i posted a few courses on the freecodecamp youtube channel and people liked them a lot).
again, please send me a DM if you have any questions. would love to hear you out and possibly help steer you in the right direction based on your interests :)
@cecil_nyasha@daddyhope@TateMavetera Are you sure you know what she said at the Potraz Breakfast Meeting? Read her LinkedIn post. In future don't comment about things you have no idea about and also leave IT and Law stuff to the right professionals in the field. https://t.co/ase4PTgjrl
This man destroyed wokeism:
Naval Ravikant.
He was an early investor in Notion, Twitter & Uber & is worth $600M+
He's been on fire lately on X.
Here is Naval's updated philosophy:
Hi ML peeps, how much Calculus is needed before I start jumping into DL
could you just cross-check if these topics are enough:
- Fundamental Ideas, Rates and Differentials
- Functions and Derivatives
- DIfferentials of ALgebraic Functions
- Use of Rates and Differentials
- Differentials of Trigonometric Functions
- Velocity, Acceleration and Derivatives
- Interpretation of Functions and Derivatives by Means of Graphs
- Maximum and Minimum Values
- Maxima and Minima
- Differentials of Log and Exponential Functions
- Integral Formulas
"Understanding LLMs from Scratch Using Middle School Math"
Neural networks learn to predict text by converting words to numbers and finding patterns through attention mechanisms.
So the network turns words into numbers, then use attention to decide what's important for predicting next words
Nice long blog (40 minuted reading time), check the link in comment.
I want to run AI agents to scrape specific URLs and do some data extraction until they find a certain kind of information. Like an AI investigator.
What’s the framework that allows for this kind of cutting-edge stuff? Is there an AI agent project that we should be using?
"Generative AI puts [ad-buying] on steroids: advertisers can provide Meta with broad parameters and brand guidelines and let the black box not just test out a few pieces of creative but an effectively unlimited amount.
Critically, this generative AI application has a verification function: did the generated ad generate more revenue or less?"
🚨This week’s top AI/ML research papers:
- GPT-4o System Card
- Are LLMs Better than Reported?
- Can Language Models Replace Programmers?
- CLEAR
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking
- SelfCodeAlign
- Mixture of Parrots
- Unpacking SDXL Turbo
- A prescriptive theory for brain-like inference
- Modular Duality in Deep Learning
- Learning Video Representations without Natural Videos
- CORAL
- Task Vectors are Cross-Modal
- Mind Your Step (by Step)
- ShadowKV
- MarDini
- COAT
- Fast Best-of-N Decoding via Speculative Rejection
- Continuous Speech Synthesis using per-token Latent Diffusion
- Teach Multimodal LLMs to Comprehend Electrocardiographic Images
- FasterCache
- Read-ME
- VibeCheck
- HoPE
- In-Context LoRA for Diffusion Transformers
- Knowledge Graph Enhanced Language Agents for Recommendation
- $100K or 100 Days
- On Memorization of Large Language Models in Logical Reasoning
- Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
- Grounding by Trying
- Relaxed Recursive Transformers
- Combining Induction And Transduction For
Abstract Reasoning
overview for each + author's explanations
read this in thread mode for the best experience
🚨This week’s top AI/ML research papers:
- Sparse Crosscoders
- Rethinking Softmax
- Mechanistic Unlearning
- Decomposing The Dark Matter of Sparse Autoencoders
- ZIP-FIT
- Automatically Interpreting Millions of Features in Large Language Models
- Breaking the Memory Barrier
- Can Knowledge Editing Really Correct Hallucinations?
- Framer: Interactive Frame Interpolation
- Beyond position
- A Hitchhiker's Guide to Scaling Law Estimation
- Scaling up Masked Diffusion Models on Text
- Why Does the Effective Context Length of LLMs Fall Short?
- Scaling Diffusion Language Models via Adaptation from Autoregressive Models
- Improve Vision Language Model Chain-of-thought Reasoning
- PyramidDrop
- FrugalNeRF
- SAM2Long
- SeerAttention
- FiTv2
overview for each + authors' explanations
read this in thread mode for the best experience
Imagine being Sundar Pichai now:
- you had the largest continually updated data set of any company to train AI on (the Google Index)
- you invented the underlying technology of LLMs like ChatGPT in 2017 called Transfomers
- you had complete search dominance: all you had to add was AI and you'd own the market
And yet:
- you managed to complete fumble your massive head start and was late to everything
- you made your APIs so hard to use nobody seriously integrated it into their apps and people instead went Anthropic and OpenAI
- you now see your search dominance quickly slipping away to Perplexity and yesterday's launched ChatGPT Search
This will be a business case studied in universities for decades
How do brains “infer” the world’s state from noisy sensory data—and do so “dynamically?”
Our new theoretical framework bridges these two perspectives in a brain-inspired model👉🧵[1/n]
w/ amazing co-lead @dekelgalor & polymath mentor @jcbyts
📜preprint: https://t.co/xmjFLintZb