GrandStay Hotels had 4 years of operations crammed into one xls file
50+ columns. Duplicate records. Comma-separated everything. No currency codes. Broken formulas.
I replaced the flat excel file with a Fully Normalised SQL Server database from scratch.
Here is what I built ๐งต
REALITY 2: EVERY JOIN IS A RISK UNTIL YOU PROVE THE GRAIN STILL MAKES SENSE.
In a course, joins are usually clean.
Customers join to orders.
Products join to categories.
Dates join to calendars.
The relationship is obvious.
In a real analytics environment, joins are where many expensive mistakes begin.
I have seen technically valid joins create commercially wrong numbers because the analyst did not check the grain.
Example.
You start with an orders table.
One row per order.
Then you join to order_items.
One row per product inside each order.
The query runs.
No error.
But now each order appears multiple times.
If you calculate total revenue from the order table after that join, your revenue can be inflated overnight.
The database won't warn you.
The dashboard won't warn you.
The stakeholder will only notice when your number doesn't match Finance.
That's why experienced analysts don't ask:
"Did the join run?"
They ask:
"What happened to the row count after the join?"
Before and after every important join, I usually check:
SELECT COUNT(*) AS rows,
COUNT(DISTINCT order_id) AS orders
FROM orders;
Then after the join:
SELECT COUNT(*) AS rows,
COUNT(DISTINCT order_id) AS orders
FROM joined_result;
If rows increase but distinct orders stay the same, the grain has changed.
That may be fine.
But you must know it happened.
I also check for duplicate keys before joining:
SELECT customer_id,
COUNT(*) AS records
FROM customer_dim
GROUP BY customer_id
HAVING COUNT(*) > 1;
Because a table called customer_dim does not guarantee one row per customer.
It might contain historical versions.
It might contain duplicates from a CRM migration.
It might contain one record per customer per region.
It might contain multiple status changes per customer.
The name of a table is not proof of its grain.
This is why I always try to define the grain in plain English:
โ One row per order.
โ One row per order item.
โ One row per customer per month.
โ One row per customer status change.
โ One row per session.
โ One row per campaign per day.
Once the grain is clear, the join strategy becomes clearer.
Junior analysts join tables because the keys match.
Senior analysts join tables only after they understand what one row means in each table.
@twitgameboy Dear Future Tee, you tried enduring life ups and downs,
Every ounce of success and progress you are enjoying right now, YOU DESERVE IT AND MORE !!!!!!
@officialladi_T SQL can never go obsolete
And when it comes to getting a job, most companies data lives in databases so you have to know how query and validate
Lastly, Sql is serving as a backend for some analytical tools