Good morning, we are waking up to news of many abductions of citizens the government suspect of being involved in #RejectFinanceBill2024. @LawSocietyofKe you need to sue @KindikiKithure in his personal capacity for the state abductions! Noordin, Koome, & Amin, release them!
The teacher behind this beautiful work is called Owen. Hats off to you Sir for teaching the kids good music.
If Mbilia Bel is still in Kenya, this would make a good excuse to go give a show in Kisumu.
Are you looking for an end-to-end streaming tutorial or a project to understand the foundational skills required to build streaming pipelines? Then this post is for you.
We will use Apache Flink and Apache Kafka for stream processing and queuing.
https://t.co/MUfiA47XKL
#data
Let's do some Math.
A 2.7L petrol J150 costs 5M, Consumes averagely 8kmpl.
A 2.8L diesel j150 costs 6M, consumes 11kmpl.
Petrol costs 178/L
Diesel costs 162/L
Most people do 15000Km/year and own cars for 5 years.
Thats 1,668,750Ksh petrol and 1,104,545 Kshs diesel.
What is GraphQL? Is it a replacement for the REST API?
The diagram below shows the quick comparison between REST and GraphQL.
๐นGraphQL is a query language for APIs developed by Meta. It provides a complete description of the data in the API and gives clients the power to ask for exactly what they need.
๐นGraphQL servers sit in between the client and the backend services.
๐นGraphQL can aggregate multiple REST requests into one query. GraphQL server organizes the resources in a graph.
๐นGraphQL supports queries, mutations (applying data modifications to resources), and subscriptions (receiving notifications on schema modifications).
We talked about the REST API in last weekโs video and will compare REST vs. GraphQL vs. gRPC in a separate post/video.
Over to you:
1). Is GraphQL a database technology?
2). Do you recommend GraphQL? Why/why not?
โ
Subscribe to our weekly newsletter to get a Free System Design PDF (158 pages):ย https://t.co/oxMBsTqaGS
Entrepreneurship culture in America is all messed up and itโs a shame.
TechCrunch. Product Hunt. Shark Tank.
Itโs all about new ideas. Changing the world. Innovation. 0 to 1. Blue ocean. Venture capital and exits and scalability.
And ITโS ALL A LIE.
If you ask the average American who a real entrepreneur is theyโll say Jobs, Musk or Zuck. We read their books and idolize them and hang on to their every word.
So the brightest among us think they need a moat. A new idea. Something revolutionary. We're setting them up for FAILURE.
I took an entrepreneurship course at Cornell in 2011. 24 kids with new ideas. Big plans. Pitch decks looking for series As.
I was #25 with a regular old-fashioned business. When professors asked me what my differentiator was I didnโt have an answer.
"We're just going to pick up people's stuff and store it when they go home for the summer. I'll answer the phone, do things a little better and I think I can make some decent money."
I saw a company out there doing sweaty, non-scalable work. They weren't very good at it and yet they made really good money.
I started by trading my time for money. Bought a $1500 cargo van. Storage Squad was born. Used the things I had in my life to make some profit.
I wasnโt trying to educate a customer base.
I wasnโt following my passion.
I didnโt need funding or a network.
I wasnโt competing against brilliant folks from Stanford.
I want trying to prove a concept.
I wasnโt emotionally attached to anything except adding value.
My customers and my competitors existed. I could study them interacting with each other. I made decisions with my brain, not my heart. I was competing against folks with fax machines, clipboards and paper ledgers.
And the best part... WE WERE PROFITABLE FROM DAY ONE.
Not a single one of those 24 folks in my class succeeded. They all went and got jobs. Their new ideas didnโt catch on. They all had dreams of millions of users and an exit. Scalable models that could work anywhere from a computer. But 99% failed to make a single dollar.
We made enough money in a few years to build our first self storage facility. That grew into the 60+ property $100m+ portfolio we own and operate today. We sold the service business in January 2021 for $1.75 million. We had no debt and no silent partners. My business partner and I split the cash.
So who are the real entrepreneurs? Who are the wealthiest people you know? Iโm not talking about money. Iโm talking about the people who do what they want to do when they want to do it. Who are they?
Now here comes the hard truth. I know a lot of wealthy entrepreneurs. None of them had new ideas. Very few of them raised VC money. None of them were on shark tank. They all did common things uncommonly well. Regular old businesses just a little better.
BORING STUFF.
Most of them have a few things in common: They worked really hard doing something not fun for 5+ years. Many times 20+ yrs. They started out trading their time for money. They did things that werenโt scalable. Many of them offered services.
They all had to talk to people. Most of the time face to face. They had to sell themselves and their ideas. They didnโt take a lot of risk. Most of them hired coders but few of them were coders.
The main point:
Stop buying into the hype. The click bait. The sexy stories of overnight success and mega riches. Entrepreneurship isnโt that complicated. Do something with good odds, low risk and moderate rewards. Donโt master your craft, master leading other people.
Think with your head, not your heart. Itโs not about you and what YOU love or what YOU want to be doing. And lastly.. Start SMALL.
Biz is about momentum. I started 10 yrs ago carrying boxes up spiral staircases. Now Iโm buying millions worth of real estate.
And the best part. When youโre successful, experienced, wealthy and you have a killer network...
Itโs time to change the world with something BIG.
How to ensure ๐๐ฎ๐๐ฎ ๐ค๐๐ฎ๐น๐ถ๐๐ ๐ถ๐ป ๐ ๐ฎ๐ฐ๐ต๐ถ๐ป๐ฒ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐ฆ๐๐๐๐ฒ๐บ๐?
It is extremely important to ensure Data Quality upstream of ML Training and Inference Pipelines, trying to do it in the pipelines will cause unavoidable failure when working at scale. Data Contracts can be leveraged for this goal.
Data Contract is an agreement between Data Producers and Data Consumers about the qualities to be met by Data being produced.
Data Contract should hold the following non-exhaustive list of metadata:
๐ Schema Definition.
๐ Schema Version.
๐ SLA metadata.
๐ Semantics.
๐ Lineage.
๐ โฆ
Example Architecture Enforcing Data Contracts:
๐ญ: Schema changes are implemented in version control, once approved - they are pushed to the Applications generating the Data, Databases holding the Data and a central Data Contract Registry.
[๐๐บ๐ฝ๐ผ๐ฟ๐๐ฎ๐ป๐]: Ideally you should be enforcing a Data contract at this stage, when producing Data. Data Validation steps down the stream are Detection and Prevention mechanisms that donโt allow low quality data to reach downstream systems. There might be a significant delay before you can do those checks, causing irreversible corruption or loss of data.
Applications push generated Data to Kafka Topics:
๐ฎ: Events emitted directly by the Application Services.
๐ This also includes IoT Fleets and Website Activity Tracking.
๐ฎ.๐ญ: Raw Data Topics for CDC streams.
๐ฏ: A Flink Application(s) consumes Data from Raw Data streams and validates it against schemas in the Contract Registry.
๐ฐ: Data that does not meet the contract is pushed to Dead Letter Topic.
๐ฑ: Data that meets the contract is pushed to Validated Data Topic.
๐ฒ: Data from the Validated Data Topic is pushed to object storage for additional Validation.
๐ณ: On a schedule Data in the Object Storage is validated against additional SLAs in Data Contracts and is pushed to the Data Warehouse to be Transformed and Modeled for Analytical purposes.
๐ด: Modeled and Curated data is pushed to the Feature Store System for further Feature Engineering.
๐ด.๐ญ: Real Time Features are ingested into the Feature Store directly from Validated Data Topic (5).
๐ Ensuring Data Quality here is complicated since checks against SLAs is hard to perform.
๐ต: High Quality Data is used in Machine Learning Training Pipelines.
๐ญ๐ฌ: The same Data is used for Feature Serving in Inference.
Note: ML Systems are plagued by other Data related issues like Data and Concept Drifts. These are silent failures and while they can be monitored, we donโt include it in the Data Contract.
Let me know your thoughts! ๐
---------
Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space.
Also hit ๐to stay notified about new content.
๐๐ผ๐ปโ๐ ๐ณ๐ผ๐ฟ๐ด๐ฒ๐ ๐๐ผ ๐น๐ถ๐ธ๐ฒ ๐, ๐๐ต๐ฎ๐ฟ๐ฒ ๐ฎ๐ป๐ฑ ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐!
Join a growing community of Data Professionals by subscribing to my ๐ก๐ฒ๐๐๐น๐ฒ๐๐๐ฒ๐ฟ: https://t.co/qgNCnGtF4A
Here are some notes on ๐ช๐ฟ๐ถ๐๐ถ๐ป๐ด ๐๐ฎ๐๐ฎ ๐๐ผ ๐๐ฎ๐ณ๐ธ๐ฎ.
Kafka is an extremely important ๐๐ถ๐๐๐ฟ๐ถ๐ฏ๐๐๐ฒ๐ฑ ๐ ๐ฒ๐๐๐ฎ๐ด๐ถ๐ป๐ด ๐ฆ๐๐๐๐ฒ๐บ to understand as it was the first of its kind and most of the new products are built on the ideas of Kafka.
๐ฆ๐ผ๐บ๐ฒ ๐ด๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐น ๐ฑ๐ฒ๐ณ๐ถ๐ป๐ถ๐๐ถ๐ผ๐ป๐:
โก๏ธ Clients writing to Kafka are called ๐ฃ๐ฟ๐ผ๐ฑ๐๐ฐ๐ฒ๐ฟ๐,ย
โก๏ธ Clients reading the Data are called ๐๐ผ๐ป๐๐๐บ๐ฒ๐ฟ๐.
โก๏ธ Data is written into ๐ง๐ผ๐ฝ๐ถ๐ฐ๐ that can be compared to ๐ง๐ฎ๐ฏ๐น๐ฒ๐ ๐ถ๐ป ๐๐ฎ๐๐ฎ๐ฏ๐ฎ๐๐ฒ๐.
โก๏ธ Messages sent to Topics are called ๐ฅ๐ฒ๐ฐ๐ผ๐ฟ๐ฑ๐.
โก๏ธ Topics are composed of ๐ฃ๐ฎ๐ฟ๐๐ถ๐๐ถ๐ผ๐ป๐.
โก๏ธ Each Partition behaves like and is a set of ๐ช๐ฟ๐ถ๐๐ฒ ๐๐ต๐ฒ๐ฎ๐ฑ ๐๐ผ๐ด๐.
๐ช๐ฟ๐ถ๐๐ถ๐ป๐ด ๐๐ฎ๐๐ฎ:
โก๏ธ There are two types of records that can be sent to a Topic - ๐๐ผ๐ป๐๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐ฎ ๐๐ฒ๐ ๐ฎ๐ป๐ฑ ๐ช๐ถ๐๐ต๐ผ๐๐ ๐ฎ ๐๐ฒ๐.
โก๏ธ If there is no key, then records are written into Partitions in a ๐ฅ๐ผ๐๐ป๐ฑ ๐ฅ๐ผ๐ฏ๐ถ๐ป ๐ณ๐ฎ๐๐ต๐ถ๐ผ๐ป.
โก๏ธ If there is a key, then records with the same keys will always be written to the ๐ฆ๐ฎ๐บ๐ฒ ๐ฃ๐ฎ๐ฟ๐๐ถ๐๐ถ๐ผ๐ป.
โก๏ธ Data is always written to the ๐๐ป๐ฑ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฃ๐ฎ๐ฟ๐๐ถ๐๐ถ๐ผ๐ป.
โก๏ธ When written, a record gets an ๐ข๐ณ๐ณ๐๐ฒ๐ assigned to it which denotes its ๐ข๐ฟ๐ฑ๐ฒ๐ฟ/๐ฃ๐น๐ฎ๐ฐ๐ฒ ๐ถ๐ป ๐๐ต๐ฒ ๐ฃ๐ฎ๐ฟ๐๐ถ๐๐ถ๐ผ๐ป.
โก๏ธ Partitions have separate sets of Offsets starting from 0.
โก๏ธ Offsets are incremented sequentially when new records are written.
Any insights you would add about writing data to Kafka? Let me know in the comment section ๐
--------
Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space.
Also hit ๐to stay notified about new content.
๐๐ผ๐ปโ๐ ๐ณ๐ผ๐ฟ๐ด๐ฒ๐ ๐๐ผ ๐น๐ถ๐ธ๐ฒ ๐, ๐๐ต๐ฎ๐ฟ๐ฒ ๐ฎ๐ป๐ฑ ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐!
Join a growing community of Data Professionals by subscribing to my ๐ก๐ฒ๐๐๐น๐ฒ๐๐๐ฒ๐ฟ: https://t.co/qgNCnGtF4A
The Twelve-Factor App methodology is a methodology for building software-as-a-service applications by Adam Wiggins. We cover how they have since evolved, and what we can learn from them today and how they changed the status quo of yesteryear.
https://t.co/BMjBn3D47h
Fallacies of distributed systems are a set of assertions made by L Peter Deutsch and others at Sun Microsystems describing false assumptions that programmers new to distributed applications invariably make.
https://t.co/HesHhubGMQ