Soft Labels =>
Each class is assigned a probability between 0 and 1
Example:
Next word prediction: "My teddy bear is..."
π Cute β 60%
π New β 25%
π§Έ Soft β 15%
β Captures uncertainty
β Common in NLP and knowledge distillation
#AI#Machinelearning#LLM
A Classification Label is the ground truth a model tries to predict.
Two common types:
Hard Labels =>
Each class is either present or absent
Example:
π§Έ Teddy Bear = 1
π Car = 0
π Volleyball = 0
β Clear, binary answers
β Common in image classification
#AI#MachineLearning
Low Loss β Predictions are close to the target values.
High Loss β Predictions are far from the target values.
In simple terms: the loss function tells the model how wrong it is so it can learn to become more accurate.
#AI#MachineLearning#DeepLearning#NeuralNetworks
A Loss Function measures how far a model's predictions are from the true target values.
Input β Model β Prediction (Ε·)
β¬
Loss Function L(y, Ε·)
β¬
Prediction Error
Goal:
Quantify model performance during training.
#AI#MachineLearning#DeepLearning#NeuralNetworks
π£ Steps per Epoch (s)
Number of iterations needed to see all training examples.
Relationship:
N = b Γ s
Example:
If N = 10,000 and b = 100, then s = 100 steps per epoch.
After 1 epoch, the model has seen every training example exactly once.
#AI#LLM#Machinelearning
An Epoch is one complete pass through the entire training dataset during model training.
Key terms:
π Training Size (N)
Total number of training examples.
π¦ Batch Size (b)
Number of examples processed at once.
#AI#MachineLearning#LLM
πΉ He Initialization =>
Best for: ReLU, Leaky ReLU, ELU, GELU
Goal: Preserve signal strength in deep networks and improve gradient flow.
β‘οΈ Poor initialization can lead to vanishing or exploding gradients, making training difficult or even impossible.
#AI#MachineLearning
Weight Initialization is the first step before training a neural network.
Every weight starts with an initial value, and choosing these values wisely can significantly affect:
β‘ Training speed
π Convergence stability
π― Final model performance
#AI#LLM#MachineLearning
Popular initialization methods:
πΉ Xavier (Glorot) =>
Best for: Sigmoid & Tanh activations
Goal: Keep activations from becoming too large or too small as they flow through the network.
The Output Layer is the final stage of a neural network, where predictions are produced.
Two common types of outputs:
β‘οΈ Classification answers "Which category?"
β‘οΈ Regression answers "What value?"
#AI#MachineLearning#DeepLearning#NeuralNetworks#LLM
Classification=>
The network predicts probabilities for each class using the Softmax function:
pα΅’ = eα΅β± / Ξ£β±Ό eα΅Κ²
All probabilities sum to 1
The class with the highest probability becomes the prediction
Example:
π± Cat: 0.85
πΆ Dog: 0.10
π° Rabbit: 0.05
#AI#LLM
ELU
A(z) = z, if z > 0
Ξ±(eαΆ» β 1), if z β€ 0
Range: (-Ξ±, β)
Use: Faster convergence and improved learning stability.
Shape: Smooth negative curve, linear positive side.
ReLU powers most CNNs, while GELU dominates modern Transformer architectures.
#AI#DeepLearning#LLM
Hidden Layers are where a neural network learns patterns from data.
Each hidden layer uses:
βοΈ Weights β determine the importance of inputs
β Biases β shift the computation
β‘ Activation Functions β introduce nonlinearity, allowing the model to learn complex relationships
Examples:
πText β words/tokens converted into vectors
π΅ Audio β sound represented by extracted features πΌοΈ Images β pixels represented by RGB values
Different data types, same goal: transform real-world information into numbers that neural networks can understand.
#AI
Input is the starting point of a Feed-Forward Neural Network (FFNN).
The network receives data as a vector of numbers (an embedding), which represents information in a machine-readable format.
#AI#MachineLearning#DeepLearning#NeuralNetworks