Data Lad

Verified account

@DataLad87

Just a lad who likes clean data 📈 | ML, analytics & a bit of frontend | Learning loudly, shipping daily

London

Joined May 2026

15 Following

3 Followers

40 Posts

2 days ago

A scalar subquery returns exactly one value, so you can drop it anywhere SQL expects one: a column, a filter, a HAVING clause, even mid-expression. New guide on BigQuery scalar subqueries: the one-row-one-column rule, every place they fit, and when a JOIN scales better. #BigQuery #SQL #DataAnalytics https://t.co/vnMEk86kYm

0

0

0

0

11

3 days ago

Theory only takes you so far. The best way to understand hyperparameter tuning is to watch a default model get beaten by a tuned one on the same data. New code-along in Python: manual search, GridSearchCV, RandomizedSearchCV, and coarse-to-fine, all in scikit-learn, with the honest result at the end. #Python #MachineLearning #DataScience https://t.co/nk5of39puv

0

0

0

0

11

3 days ago

A single model has a ceiling. Ensembles break through it by combining models that are wrong in different ways. New article on ensemble methods in Python: voting, bagging, boosting with XGBoost and LightGBM, and stacking, plus how to pick the right one to fix bias or variance. #MachineLearning #Python #DataScience https://t.co/vs3TLK3AWd

0

0

0

0

10

4 days ago

The Thank You page is where your revenue numbers are born, and one wrong semicolon can corrupt them all. New article on implementing the Adobe Analytics purchase confirmation page: the purchase event, s.products, deduplicating with purchaseID, and stitching in offline data. #AdobeAnalytics #Analytics #Ecommerce https://t.co/NatMeD1ThQ

0

0

0

0

18

4 days ago

Before you analyze anything, you have to get the data in, and it never arrives in the format you'd choose. New article on importing data in Python: CSV and Excel, pickle, SAS, HDF5, MATLAB, SQL databases, plus pulling data from the web with requests, BeautifulSoup, and JSON APIs. #Python #DataScience #pandas https://t.co/pXc6SYEgLI

0

0

0

0

13

5 days ago

Cookie loss, ITP, and consent refusals quietly undercount your conversions. Google's bidding algorithms then make worse decisions on incomplete data. New article on #enhancedconversions in GTM: how hashed first-party data recovers lost conversions, fed cleanly through the dataLayer and gated behind consent. #GoogleAds #GTM #GA4 https://t.co/aF8w0TS7CE

0

0

0

0

15

5 days ago

Most teaching datasets are too clean, so the hardest part of the job never gets practised. I wrote up how I built a simulated churn dataset: planted duplicates, three kinds of missing data, dirty country labels, and a leakage trap that fakes a 0.90 AUC. You can download it for free here and practice on it. #DataScience #MachineLearning #Python https://t.co/ZsNOrz6PYh

0

1

0

0

21

5 days ago

Iframe checkout? Your purchase events are landing in a sealed room your GTM can't see into. New article on fixing it with window.postMessage: send events from the iframe, validate the origin on the parent, and fire clean GA4 ecommerce events. #GA4 #GoogleTagManager #Analytics https://t.co/LMAUa9xyBi

0

0

0

0

25

5 days ago

Most analytics can tell you a sale happened. Enhanced ecommerce tells you the story behind it: what got viewed, clicked, added, abandoned, and finally bought. New guide on implementing GA4 ecommerce tracking in GTM: the dataLayer contract, standard events, and testing the full funnel. #GA4 #GoogleTagManager #Ecommerce https://t.co/9MsMCRHs3X

0

0

0

0

21

6 days ago

If you're learning data science and want a project that goes beyond "fit a model on clean data," I built a full churn prediction code-along: deliberately messy dataset, a hidden leakage trap, three missingness mechanisms, and a logistic regression that beats a random forest. Everything is explained line by line, and the notebook plus data are free to download. The fun part: the "obvious" best feature is a trap, and spotting why is half the lesson. Happy to answer questions if anyone works through it. https://t.co/vjnATNkFEe

0

0

0

0

66

6 days ago

"Can you add our tags to your site?" doesn't have to be a risk conversation. New article on GTM Zones: link partner containers, scope them with URL conditions, whitelist exactly what can fire, and audit the rest. Tag governance done properly. #GTM #GoogleTagManager #MarTech https://t.co/z52zPVd2ww

0

0

0

0

26

6 days ago

A decade on, XGBoost is still the king of tabular data. New practical guide: fit and predict, DMatrix, cross-validation with early stopping, hyperparameter tuning, and building sklearn pipelines that don't leak. #XGBoost #MachineLearning #Python https://t.co/gVDwKskdQW

0

0

0

0

70

6 days ago

A decade on, XGBoost is still the king of tabular data. New practical guide: fit and predict, DMatrix, cross-validation with early stopping, hyperparameter tuning, and building sklearn pipelines that don't leak. #XGBoost #MachineLearning #Python https://t.co/hajpr2uh5x

0

0

0

0

46

6 days ago

A class is a cookie cutter. Instances are the cookies. Once that clicks, Python OOP stops being intimidating. New article covering classes, self, init, inheritance with super(), dunder methods, and custom exceptions that fail fast. #Python #OOP #100DaysOfCode https://t.co/SNTNyFTZdZ

0

1

0

0

20

6 days ago

The difference between [] and () in Python can be the difference between a script that streams 100 GB on a laptop and one that crashes. New article on iterators, comprehensions, and generators: enumerate, zip, yield, and reading files too big for memory in chunks. #Python #DataScience https://t.co/t5Oc5dO83n

0

0

0

0

14

7 days ago

Most tutorials hand you clean data. This one doesn't. A complete churn analysis in one notebook: messy labels, three kinds of missing data, a leakage trap that fakes 0.90 AUC, and a twist: logistic regression beats the random forest. #DataScience #Python #MachineLearning Free notebook + dataset: https://t.co/uwVK3mipEG

0

0

0

0

19

7 days ago

Run a large language model on your own laptop. No API keys, no per-token costs, full data privacy. New article on Llama 3 with llama-cpp-python: decoding parameters, prompt engineering, guaranteed-valid JSON output, and building a chatbot that remembers the conversation. #Llama3 #LLM #Python https://t.co/sgd3VQUGTk

0

0

0

0

37

7 days ago

Text, images, audio, and video in one workflow. New article on multi-modal models with Hugging Face: zero-shot classification with CLIP, voice conversion, ControlNet image editing, video generation, and scoring it all with CLIP score. #HuggingFace #AI #MachineLearning https://t.co/yu4FMhLXP9

0

0

0

0

44

9 days ago

State-of-the-art language models in 3 lines of Python. New article covers the pipeline API, fine-tuning with the Trainer, and every evaluation metric you need: BLEU, ROUGE, perplexity, exact match, toxicity, and more. #LLM #AI #Python https://t.co/9o4DaLsPOB

0

0

0

0

44

9 days ago

#HuggingFace puts state-of-the-art #AI into 3 lines of #Python. New article: run text classification, zero-shot labeling, summarization, and document QA using pipeline() and the transformers library. https://t.co/rWTEqQZhXB

0

2

0

0

71

Last Seen Users on Sotwe

Trends for you

Most Popular Users