Yehan Wasura @rezsat - Twitter Profile

rezsat retweeted

Elon Musk

@elonmusk

4 months ago

Clawdbot 😂

3K

81K

6K

11K

18M

rezsat retweeted

Ghoshan

@ghostyfreak

5 months ago

Cafe Cursor Colombo #Cursor

2

7

2

0

305

rezsat retweeted

Rasal Jayasinghe

@rasaljaya

5 months ago

Cafe Cursor Colombo 🇱🇰 Wrapped up in style! Colombo was filled with builders on Saturday! See you all again! In another Cursor event! Thank you @benln & @ftnabeelah & @cursor_ai for all the immense support!

rasaljaya's tweet photo. Cafe Cursor Colombo 🇱🇰

Wrapped up in style!

Colombo was filled with builders on Saturday!

See you all again! In another Cursor event!

Thank you @benln & @ftnabeelah & @cursor_ai for all the immense support! https://t.co/0AakRMl1wP

5

53

8

0

5K

Yehan Wasura @rezsat

5 months ago

@Uditha4930 😆thnks

0

1

Yehan Wasura @rezsat

5 months ago

I just published: I Turned an EXE Into Music (and it actually tells you something) https://t.co/pDoFbqFyhf #ReverseEngineering #Python #CreativeCoding #CyberSecurity

rezsat's tweet photo. I just published: I Turned an EXE Into Music (and it actually tells you something) https://t.co/pDoFbqFyhf

#ReverseEngineering #Python #CreativeCoding #CyberSecurity https://t.co/NdCgN6YEYZ

1

2

0

17

rezsat retweeted

anshuman

@athleticKoder

7 months ago

ML concepts every data scientist should know for interviews: Bookmark this. 1. Bias-Variance Tradeoff 2. Cross-Validation Strategies 3. Regularization (L1, L2, Elastic Net) 4. Class Imbalance & Sampling Techniques 5. Feature Engineering & Selection 6. Overfitting vs Underfitting 7. Evaluation Metrics (beyond accuracy) 8. Hyperparameter Tuning 9. Train-Test Data Leakage 10. Ensemble Methods 11. Dimensionality Reduction 12. Model Interpretability (SHAP, LIME) 13. Gradient Descent Variants 14. Activation Functions & Neural Networks 15. Imbalanced Dataset Handling 16. Production Model Monitoring

20

1K

115

2K

60K

rezsat retweeted

Avi Chawla

@_avichawla

7 months ago

I have been fine-tuning LLMs for over 2 years now! Here are the top 5 LLM fine-tuning techniques, explained with visuals: First of all, what's so different about LLM finetuning? Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB). Since this kind of compute isn't accessible to everyone, parameter-efficient finetuning (PEFT) came into existence. Before we go into details of each technique, here's some background that will help you better understand these techniques: LLM weights are matrices of numbers adjusted during finetuning. Most PEFT techniques involve finding a lower-rank adaptation of these matrices, a smaller-dimensional matrix that can still represent the information stored in the original. Now with a basic understanding of the rank of a matrix, we're in a good position to understand the different finetuning techniques. (refer to the image below for a visual explanation of each technique) 1) LoRA - Add two low-rank trainable matrices, A and B, alongside weight matrices. - Instead of fine-tuning W, adjust the updates in these low-rank matrices. Even for the largest of LLMs, LoRA matrices take up a few MBs of memory. 2) LoRA-FA While LoRA significantly decreases the total trainable parameters, it requires substantial activation memory to update the low-rank weights. LoRA-FA (FA stands for Frozen-A) freezes matrix A and only updates matrix B. 3) VeRA - In LoRA, low-rank matrices A and B are unique for each layer. - In VeRA, A and B are frozen, random, and shared across all layers. - Instead, it learns layer-specific scaling VECTORS (b and d) instead. 4) Delta-LoRA - It tunes the matrix W as well, but not in the traditional way. - Here, the difference (or delta) between the product of matrices A and B in two consecutive training steps is added to W. 5) LoRA+ - In LoRA, both matrices A and B are updated with the same learning rate. - Authors of LoRA+ found that setting a higher learning rate for matrix B results in better convergence. ____ Find me → @_avichawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

_avichawla's tweet photo. I have been fine-tuning LLMs for over 2 years now!

Here are the top 5 LLM fine-tuning techniques, explained with visuals:

First of all, what's so different about LLM finetuning?

Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).

Since this kind of compute isn't accessible to everyone, parameter-efficient finetuning (PEFT) came into existence.

Before we go into details of each technique, here's some background that will help you better understand these techniques:

LLM weights are matrices of numbers adjusted during finetuning.

Most PEFT techniques involve finding a lower-rank adaptation of these matrices, a smaller-dimensional matrix that can still represent the information stored in the original.

Now with a basic understanding of the rank of a matrix, we're in a good position to understand the different finetuning techniques.

(refer to the image below for a visual explanation of each technique)

1) LoRA

- Add two low-rank trainable matrices, A and B, alongside weight matrices.
- Instead of fine-tuning W, adjust the updates in these low-rank matrices.

Even for the largest of LLMs, LoRA matrices take up a few MBs of memory.

2) LoRA-FA

While LoRA significantly decreases the total trainable parameters, it requires substantial activation memory to update the low-rank weights.

LoRA-FA (FA stands for Frozen-A) freezes matrix A and only updates matrix B.

3) VeRA

- In LoRA, low-rank matrices A and B are unique for each layer.
- In VeRA, A and B are frozen, random, and shared across all layers.
- Instead, it learns layer-specific scaling VECTORS (b and d) instead.

4) Delta-LoRA

- It tunes the matrix W as well, but not in the traditional way.
- Here, the difference (or delta) between the product of matrices A and B in two consecutive training steps is added to W.

5) LoRA+

- In LoRA, both matrices A and B are updated with the same learning rate.
- Authors of LoRA+ found that setting a higher learning rate for matrix B results in better convergence.
____
Find me → @_avichawla
Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

44

2K

322

2K

128K

rezsat retweeted

Sebastian Raschka

@rasbt

7 months ago

This interesting week started with DeepSeek V3.2! I just wrote up a technical tour of the predecessors and components that led up to this: 🔗 https://t.co/JSAd9cx2s6 - Multi-Head Latent Attention - RLVR - Sparse Attention - Self-Verification - GRPO Updates

rasbt's tweet photo. This interesting week started with DeepSeek V3.2!

I just wrote up a technical tour of the predecessors and components that led up to this:

🔗 https://t.co/JSAd9cx2s6

- Multi-Head Latent Attention
- RLVR
- Sparse Attention
- Self-Verification
- GRPO Updates https://t.co/5f965hR70I

38

1K

232

810

90K

Yehan Wasura @rezsat

7 months ago

I just published The Truth About Python’s Truthiness https://t.co/a8wGZwSlCB #python #programming #article #learn #medium

2

1

0

14

Yehan Wasura @rezsat

9 months ago

I just published CPython Object Model and Reference Counting — Trying to make sense of PyObject and PyTypeObject https://t.co/0pM6J8A5Av #python #lowlevel #cpython #pythonprogramming #python3 #programming #LEARN

0

11

Yehan Wasura @rezsat

about 1 year ago

Swapping Variables in Python: Behind the scenes of a,b = b,c : https://t.co/9sE8Olk8ja

0

21

Yehan Wasura @rezsat

about 1 year ago

Swapping Variables in Python: Behind the scenes of a,b = b,c https://t.co/l0NTnT3G7F

0

11

Yehan Wasura @rezsat

over 1 year ago

@minchoi Isn't it Laixi Screen Monitoring Software? It's nothing that new maybe a little more intelligent in doing tasks now, who knows but this is like really old.

0

23

Yehan Wasura @rezsat

over 1 year ago

@karpathy Completed. Just Wow. Well it's God damn freaking Andrej Karpathy after all. Thank you so much. Can't remember watching a video more than 45 min on YouTube before.

0

2

Yehan Wasura

@rezsat

Last Seen Users on Sotwe

Trends for you

Most Popular Users