🚀 Want to write production-grade ML code that’s clean, modular & maintainable?
Here’s a simple CPO (components, pipelines, orchestrators) pattern example to structure your code.
Let's break this down with a training pipeline example👇
#MachineLearning#DataScience#MLOps
When we make a request to an API, a request object will usually be returned that contains attributes and methods relating to the request.
For example: If we requested some data this will be returned, often as a json object.
Additionally attributes such as the time the request took and the status code of the request can be returned.
LLMs are great at writing generic code, but when it comes to platform-specific stuff like AWS or Azure, they often hallucinate services, configs, or CLI commands. Always double-check against the docs.
#AI#Cloud#LLMs
🔍Which features matter most in your ML model?
Here’s a ready made function to plot feature importance for XGBoost & Random Forest.
#DataScience#MachineLearning#100DaysOfML
🚨 Does your model have hidden data leakage issues?
One common mistake: Performing EDA before splitting a holdout set.
By analyzing the entire dataset before splitting, you risk using information from the holdout set in your exploration, which can influence your choices when developing your model.
Always split your data first, then explore.
#DataScience #MachineLearning #DataAnalytics #AI #MLTips
Best place to start with your genAI learning journey.
The lessons include:
1. Introduction to Generative AI and LLMs
2. Exploring and Comparing Different LLMs
3. Using Generative AI Responsibly
4. Understanding Prompt Engineering Fundamentals
5. Creating Advanced Prompts
6. Building Text Generation Applications
7. Building Chat Applications
8. Building Search Apps with Vector Databases
9. Building Image Generation Applications
10. Building Low Code AI Applications
11. Integrating External Applications with Function Calling
12. Designing UX for AI Applications
13. Securing Your Generative AI Applications
14. The Generative AI Application Lifecycle
15. Retrieval Augmented Generation (RAG) and Vector Databases
16. Open Source Models and Hugging Face
17. AI Agents
18. Fine-Tuning LLMs
19. Building with SLMs
20. Building with Mistral Models
21. Building with Meta Models
It's free and available on Github
Time Series Cross-Validation: Rolling vs. Expanding Window 📊
When working with time-dependent data, traditional cross-validation fails because it shuffles data.
Instead, we can use Time Series Cross-Validation, ensuring data remains in chronological order. For this we have two options:
🔹 Rolling Window: A fixed-size training window moves forward step by step.
🔹 Expanding Window: The training data starts small and grows over time.
#MachineLearning #DataScience #DataAnalytics #100DaysOfML #MLOps
Time Series Cross-Validation: Rolling vs. Expanding Window 📊
When working with time-dependent data, traditional cross-validation fails because it shuffles data.
Instead, we can use Time Series Cross-Validation, ensuring data remains in chronological order. For this we have two options:
🔹 Rolling Window: A fixed-size training window moves forward step by step.
🔹 Expanding Window: The training data starts small and grows over time.
#MachineLearning #DataScience #DataAnalytics #100DaysOfML #MLOps
I used to train linear models as baselines but now I jump straight to XGBoost🚀
Tweak these 5 parameters for a solid first-pass model:
🔹 max_depth = Controls tree complexity; higher = more complex, but riskier for overfitting. Start with 6.
🔹 subsample = Randomly samples data per tree; lower this to prevent overfitting. Try values in [0.5 - 1] range
🔹 colsample_bytree = Samples features per tree. Equivalent to subsample for columns
🔹 min_child_weight = Prevents small, noisy splits; higher = more conservative model. Try 0, 5, 15 and even 200.
🔹 eta = Learning rate; lower values improve stability but need more trees. Try 0.1 or 0.01.
#MachineLearning #MLOps #DataScience #DataAnalytics #100DaysOfML
Want a more reliable way to evaluate your model? 📊
Train-test splits can be unreliable. K-Fold Cross-Validation improves performance by splitting data into k subsets (folds).
The model trains on k-1 folds and tests on the remaining fold—this repeats k times, with each fold used for testing once.
✅ Pros: More reliable than a simple train-test split, reduces variance.
❌ Cons: More computationally expensive.
#MachineLearning #MLOps #DataScience #DataAnalytics #100DaysOfML
@DSDecoded Agreed. Always better to have a problem and then work out what you need to learn to solve it. It's more practical and helps with motivation.