I recently wrote a new blog post where I talked about a few things, based on my experience, that a junior data engineer or a fresh graduate entering the world of data can take care of from the beginning of their career, apart from the technical skills.
#dataengineering#data
Try avoiding SELECT * even on single-column tables
One interesting case I ran into with a customer some years ago (2012-2013 I think)
A backend API that is stable and runs in single digit millisecond. Until one day users came in to a slow and sluggish user experience.
We checked the commits and nothing was obvious, most changes were benign. Just in case we reverted al commits but still the app was slow.
Looking at the diagnostics, we noticed API log time taking from 500 ms to up to 2 seconds at times.
We know nothing has changed in the backend that would’ve cause the slow down, but started looking at the database queries.
SELECT * on a table that has 3 blob fields are being returned to the client, those blob fields has very large documents.
It turned out this table had only 2 integer columns, and the API was running a SELECT * to return and use the two fields. But later, the admin added 3 blob fields that are used and populated by another application.
While those blob fields were not being returned to the client, the backend API took the hit pulling the extra fields populated by other applications, causing database, network and protocol serialization overhead.
So even if you have a single column on your table, just SELECT that column, because the schema can change from underneath you.
There are many other reasons to avoid SELECT * of course, but I’ll talk about them in another post.
—-
if you enjoyed this, check out my fundamentals of database engineering https://t.co/tiObG0HPoT $11.99 DB-AUG2024-B coupon until Aug 13
Check out the rest of my courses here
https://t.co/Qonec4XHEd
Society rewards hyperspecialization, and people just follow the incentives:
that’s how we ended up with people who can uncover the secrets of the universe but can’t figure out their own marriage,
or people who are respected by all except their own kids,
or people who teach others “how to be successful in life” who are actually angry and depressed,
or people who can solve the most complex problems but somehow cannot maintain a single long-term meaningful friendship with anyone.
You may not become “the best” if you have a well-balanced life, but at least,
you will have a shot at sustainable peace of mind,
and you may learn how to appreciate the people around you,
and enjoy a bit of the rewards that you actually worked hard to earn,
instead of living your whole life scared of missing out on the next big opportunity,
instead of chasing the wrong goals and noticing, way too late, that they didn’t matter as much as you thought at the time when you were younger,
when you thought you knew better, because you were simply surrounded by people who were as blind as you were.
A 🧵on Docker.
This is Part 1 - an introductory thread on this topic. Will post more threads on this in future.
Read this thread if you don't know anything about docker, containerisation, etc.
It is a compulsory topic like Git, Linux for every developer.
Thread🧵
(1/14)
Web accessibility: Essential for some, useful to all ✨
Learn how to ensure your website or app is user-friendly for everyone, including those who rely on screen readers.
This is especially helpful for developers like us!
Post by @imhritik_dj
https://t.co/6ggk1h03eS
Attended my first ever Data Engineering Conference at Banglore. It was such a wonderful experience to meet the greatest minds of the data industry. (Joe Reis and many more)
There were some amazing talks and panel discussions.
In all a totally great learning experience!
How is an SQL statement executed in the database?
The diagram below shows the process. Note that the architectures for different databases are different, the diagram demonstrates some common designs.
Step 1 - A SQL statement is sent to the database via a transport layer protocol (e.g.TCP).
Step 2 - The SQL statement is sent to the command parser, where it goes through syntactic and semantic analysis, and a query tree is generated afterward.
Step 3 - The query tree is sent to the optimizer. The optimizer creates an execution plan.
Step 4 - The execution plan is sent to the executor. The executor retrieves data from the execution.
Step 5 - Access methods provide the data fetching logic required for execution, retrieving data from the storage engine.
Step 6 - Access methods decide whether the SQL statement is read-only. If the query is read-only (SELECT statement), it is passed to the buffer manager for further processing. The buffer manager looks for the data in the cache or data files.
Step 7 - If the statement is an UPDATE or INSERT, it is passed to the transaction manager for further processing.
Step 8 - During a transaction, the data is in lock mode. This is guaranteed by the lock manager. It also ensures the transaction’s ACID properties.
–
Subscribe to our weekly newsletter to get a Free System Design PDF (158 pages): https://t.co/FIzCeaWsZV
I recently wrote a new blog post where I talked about a few things, based on my experience, that a junior data engineer or a fresh graduate entering the world of data can take care of from the beginning of their career, apart from the technical skills.
#dataengineering#data
Women's day mandatory read
Build a courage fund:
6m to 1 yr expenditures in savings/FD
Hypothesis:
1 yr liquid savings gives courage to a woman to stand up to parents, bf, anyone
it really helps
read this article 3x
you'll know what am talking about
https://t.co/RKETLnDkly
Hello! Would you be interested in a 2-day online workshop on building & maintaining a #React app w/ Helpshift engineers. Build a Twitter clone, and learn engineering basics not taught in college. #WebDev#Workshop#react#NodeJS https://t.co/4ju36JKOdY
I came across some practice problems in data engineering on Github. I have shared my solution for Exercise 1 in this blog post - Data Engineering Practice 1 — Downloading files https://t.co/v7uJEVTQvq
#dataengineering#dataengineer#learning#Python