Daniel @indentfour - Twitter Profile

Pinned Tweet

Daniel

@indentfour

over 1 year ago

🚀 Want to write production-grade ML code that’s clean, modular & maintainable?

Daniel

@indentfour

over 1 year ago

🚀 Want to write production-grade ML code that’s clean, modular & maintainable? Here’s a simple CPO (components, pipelines, orchestrators) pattern example to structure your code. Let's break this down with a training pipeline example👇 #MachineLearning #DataScience #MLOps

indentfour's tweet photo. 🚀 Want to write production-grade ML code that’s clean, modular & maintainable?

Here’s a simple CPO (components, pipelines, orchestrators) pattern example to structure your code.

Let's break this down with a training pipeline example👇
#MachineLearning #DataScience #MLOps https://t.co/efqaatPufU

2

11

1

14

2K

0

2

0

228

Daniel

@indentfour

7 months ago

Check back later to see how we can build a simple API that serves a machine learning model using Flask.

0

1

0

22

Daniel

@indentfour

7 months ago

Understanding APIs is important for building production grade ML & AI solutions. Let's start with some fundamentals: #MachineLearning #AIEngineering #DataScience

indentfour's tweet photo. Understanding APIs is important for building production grade ML & AI solutions. Let's start with some fundamentals:
#MachineLearning #AIEngineering #DataScience https://t.co/ffJopiUpVH

1

0

74

Daniel

@indentfour

7 months ago

When we make a request to an API, a request object will usually be returned that contains attributes and methods relating to the request. For example: If we requested some data this will be returned, often as a json object. Additionally attributes such as the time the request took and the status code of the request can be returned.

1

0

18

Who to follow

Mario Tormo

@mt0rm0

AI Engineer & Senior Data Scientist. Appasionate about #Math, #Jazz, #Cinema and #Books. He/him. #AI #NLP #DataScience #MLOps

Deej Castro

@_datajunkie

36,🇵🇭 | Product Data Analyst | Election Data Digger | Founder/Content Creator of @ElectionMapsPH

harish chandrasekaran

@harishc00

🤖 ML Engineer l l 🧾 Exploring NLP, LLM's,LangChain l l 🤝 DM to collaborate and work

Daniel

@indentfour

7 months ago

How did I not realise this trick for unpacking and overriding Python dictionaries before. #python #datascience #programmingtricks #machinelearning

indentfour's tweet photo. How did I not realise this trick for unpacking and overriding Python dictionaries before.
#python #datascience #programmingtricks #machinelearning https://t.co/Ujj9Rsrxpb

0

73

Daniel

@indentfour

about 1 year ago

LLMs are great at writing generic code, but when it comes to platform-specific stuff like AWS or Azure, they often hallucinate services, configs, or CLI commands. Always double-check against the docs. #AI #Cloud #LLMs

0

1

0

93

Daniel

@indentfour

about 1 year ago

🔍Which features matter most in your ML model? Here’s a ready made function to plot feature importance for XGBoost & Random Forest. #DataScience #MachineLearning #100DaysOfML

indentfour's tweet photo. 🔍Which features matter most in your ML model?

Here’s a ready made function to plot feature importance for XGBoost & Random Forest.
#DataScience #MachineLearning #100DaysOfML https://t.co/gqh1eDC3ph

0

2

0

190

Daniel

@indentfour

about 1 year ago

🚨 Does your model have hidden data leakage issues? One common mistake: Performing EDA before splitting a holdout set. By analyzing the entire dataset before splitting, you risk using information from the holdout set in your exploration, which can influence your choices when developing your model. Always split your data first, then explore. #DataScience #MachineLearning #DataAnalytics #AI #MLTips

0

1

0

170

indentfour retweeted

𝗿𝗮𝗺𝗮𝗸𝗿𝘂𝘀𝗵𝗻𝗮— 𝗲/𝗮𝗰𝗰

@techwith_ram

over 1 year ago

5 GitHub Repositories to Master AI Engineering Read it here:👇 https://t.co/mMZ1nrK5U7

0

3

1

3

232

indentfour retweeted

Ani

@curiousZeedX

over 1 year ago

Best place to start with your genAI learning journey. The lessons include: 1. Introduction to Generative AI and LLMs 2. Exploring and Comparing Different LLMs 3. Using Generative AI Responsibly 4. Understanding Prompt Engineering Fundamentals 5. Creating Advanced Prompts 6. Building Text Generation Applications 7. Building Chat Applications 8. Building Search Apps with Vector Databases 9. Building Image Generation Applications 10. Building Low Code AI Applications 11. Integrating External Applications with Function Calling 12. Designing UX for AI Applications 13. Securing Your Generative AI Applications 14. The Generative AI Application Lifecycle 15. Retrieval Augmented Generation (RAG) and Vector Databases 16. Open Source Models and Hugging Face 17. AI Agents 18. Fine-Tuning LLMs 19. Building with SLMs 20. Building with Mistral Models 21. Building with Meta Models It's free and available on Github

curiousZeedX's tweet photo. Best place to start with your genAI learning journey.

The lessons include:

1. Introduction to Generative AI and LLMs
2. Exploring and Comparing Different LLMs
3. Using Generative AI Responsibly
4. Understanding Prompt Engineering Fundamentals
5. Creating Advanced Prompts
6. Building Text Generation Applications
7. Building Chat Applications
8. Building Search Apps with Vector Databases
9. Building Image Generation Applications
10. Building Low Code AI Applications
11. Integrating External Applications with Function Calling
12. Designing UX for AI Applications
13. Securing Your Generative AI Applications
14. The Generative AI Application Lifecycle
15. Retrieval Augmented Generation (RAG) and Vector Databases
16. Open Source Models and Hugging Face
17. AI Agents
18. Fine-Tuning LLMs
19. Building with SLMs
20. Building with Mistral Models
21. Building with Meta Models

It's free and available on Github

1

66

11

71

4K

Daniel

@indentfour

over 1 year ago

Time Series Cross-Validation: Rolling vs. Expanding Window 📊

Daniel

@indentfour

over 1 year ago

Time Series Cross-Validation: Rolling vs. Expanding Window 📊 When working with time-dependent data, traditional cross-validation fails because it shuffles data. Instead, we can use Time Series Cross-Validation, ensuring data remains in chronological order. For this we have two options: 🔹 Rolling Window: A fixed-size training window moves forward step by step. 🔹 Expanding Window: The training data starts small and grows over time. #MachineLearning #DataScience #DataAnalytics #100DaysOfML #MLOps

0

5

0

283

1

2

0

226

Daniel

@indentfour

over 1 year ago

Time Series Cross-Validation: Rolling vs. Expanding Window 📊 When working with time-dependent data, traditional cross-validation fails because it shuffles data. Instead, we can use Time Series Cross-Validation, ensuring data remains in chronological order. For this we have two options: 🔹 Rolling Window: A fixed-size training window moves forward step by step. 🔹 Expanding Window: The training data starts small and grows over time. #MachineLearning #DataScience #DataAnalytics #100DaysOfML #MLOps

0

5

0

283

Daniel

@indentfour

over 1 year ago

@DSDecoded Catboost is nice too as is LightGBM. I think all three give you a strong baseline model.

0

1

0

21

Daniel

@indentfour

over 1 year ago

I used to train linear models as baselines but now I jump straight to XGBoost🚀 Tweak these 5 parameters for a solid first-pass model: 🔹 max_depth = Controls tree complexity; higher = more complex, but riskier for overfitting. Start with 6. 🔹 subsample = Randomly samples data per tree; lower this to prevent overfitting. Try values in [0.5 - 1] range 🔹 colsample_bytree = Samples features per tree. Equivalent to subsample for columns 🔹 min_child_weight = Prevents small, noisy splits; higher = more conservative model. Try 0, 5, 15 and even 200. 🔹 eta = Learning rate; lower values improve stability but need more trees. Try 0.1 or 0.01. #MachineLearning #MLOps #DataScience #DataAnalytics #100DaysOfML

indentfour's tweet photo. I used to train linear models as baselines but now I jump straight to XGBoost🚀

Tweak these 5 parameters for a solid first-pass model:

🔹 max_depth = Controls tree complexity; higher = more complex, but riskier for overfitting. Start with 6.

🔹 subsample = Randomly samples data per tree; lower this to prevent overfitting. Try values in [0.5 - 1] range

🔹 colsample_bytree = Samples features per tree. Equivalent to subsample for columns

🔹 min_child_weight = Prevents small, noisy splits; higher = more conservative model. Try 0, 5, 15 and even 200.

🔹 eta = Learning rate; lower values improve stability but need more trees. Try 0.1 or 0.01.
#MachineLearning #MLOps #DataScience #DataAnalytics #100DaysOfML

1

8

1

5

625

Daniel

@indentfour

over 1 year ago

Want a more reliable way to evaluate your model? 📊 Train-test splits can be unreliable. K-Fold Cross-Validation improves performance by splitting data into k subsets (folds). The model trains on k-1 folds and tests on the remaining fold—this repeats k times, with each fold used for testing once. ✅ Pros: More reliable than a simple train-test split, reduces variance. ❌ Cons: More computationally expensive. #MachineLearning #MLOps #DataScience #DataAnalytics #100DaysOfML

indentfour's tweet photo. Want a more reliable way to evaluate your model? 📊

Train-test splits can be unreliable. K-Fold Cross-Validation improves performance by splitting data into k subsets (folds).

The model trains on k-1 folds and tests on the remaining fold—this repeats k times, with each fold used for testing once.

✅ Pros: More reliable than a simple train-test split, reduces variance.
❌ Cons: More computationally expensive.
#MachineLearning #MLOps #DataScience #DataAnalytics #100DaysOfML

0

6

1

2

399

Daniel

@indentfour

over 1 year ago

@DSDecoded Agreed. Always better to have a problem and then work out what you need to learn to solve it. It's more practical and helps with motivation.

0

1

0

11

Daniel

@indentfour

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users