I am currently attending Job Shadowing Session at @LuxDevHQ, @HarunMbaabu and I really love it. Actually I don't need my phone right now I'll be selling it to fund my @LuxDevHQ admission.
The workflow transitions from a bronze storage layer to a refined gold layer, using Python classes and dbt models to automate transformations. Ultimately, the project demonstrates a functional pipeline designed for scalable data modeling and rigorous schema customization.
https://t.co/maap0cbhs8
This GitHub repository documents a comprehensive data engineering project that integrates PySpark and dbt for advanced data processing. It utilizes Databricks as the primary platform to facilitate dynamic data ingestion and stream processing...
within a multi-layered architecture. Key technical milestones include implementing incremental loading, managing Slowly Changing Dimensions (SCD Type 2) for historical tracking, and performing upsert operations.
Recently managed to complete an End-to-End Azure project.
Getting data from API and connecting it to powerBI for visualization.
Tools used:
Azure Data Factory
ADLS Gen 2
Azure Databricks
Azure Synapse Analytics
Github link below 👇 ENJOY!
I know a guy in college.
Missed classes.
Studied last minute.
Just enough to pass.
Everyone called him lazy.
He believed it too.
Final year came.
Placements got close.
Reality stopped being polite.
No hacks left.
No “I’ll figure it out later.”
He picked one thing: DSA.
Not motivation reels.
Not aesthetic study vlogs.
Just showing up.
Same desk.
Same notebook.
Every single day.
8–10 hours, even when it sucked.
Graphs made no sense.
DP drained him.
Some days he felt stupid.
He still came back the next day.
A year later,
FAANG offer. ~30 LPA.
Suddenly everyone said,
“Yeah, he was always smart.”
No.
He was scared.
He was focused.
And he finally stayed with one thing long enough.
Talent didn’t change.
His direction did.
That’s the part people skip
when they casually say,
“Just grind DSA.”
Dear Growth Hackers,
As we gear up for the upcoming Night of Code, I’m excited to share the lineup of projects we’ll be working on. These challenges have been designed to engage data analysts, data scientists, and data engineers in hands-on, collaborative problem-solving. Please review the project descriptions below and start thinking about which one aligns best with your interests and skill set.
Project 1: SightSearch – Image-Driven Product Matching.
Build a visual product search platform that allows users to upload an image of an item—such as shoes, clothing, or accessories—and instantly discover stores selling identical or similar products. The system will use computer vision to analyze the uploaded photo, generate image embeddings, and match them against a continuously scraped database of product images and metadata from online and local retailers. The goal is to deliver fast, accurate results while ensuring a seamless user experience.
This project will require collaboration across data roles:
Data Engineers: Develop scraping pipelines, data architecture, and storage for product images and embeddings.
Data Scientists: Build and optimize vision models, vector similarity search, and ranking mechanisms.
Data Analysts: Analyze user interaction data, evaluate model performance, and provide insights to refine recommendations.
Project 2: Smart Meal Recommendation System.
Build an intelligent recommendation engine that suggests meals tailored to a user’s dietary preferences, health goals, and nearby restaurant options. The system should use NLP to understand user intent, predictive modeling to rank meal choices, and web scraping to collect menus, nutritional data, and pricing information.
Project roles include:
Data Engineers: Build and maintain scraping pipelines and ensure clean, structured datasets.
Data Scientists: Develop the recommendation logic, including embeddings, nutrition modeling, and ranking algorithms.
Data Analysts: Assess user behavior patterns and identify improvements in model accuracy and user satisfaction.
Project 3: Real-Time Campus Safety Alert Dashboard
Build a platform that aggregates real-time data—weather updates, traffic reports, and public safety incidents—to generate actionable safety alerts for students on campus. The system will scrape public feeds, integrate live APIs, and use ML models to classify event severity. The dashboard should clearly display incidents, forecasted risk levels, and recommended actions.
Team responsibilities:
Data Engineers: Integrate real-time data pipelines and ensure reliable ingestion services.
Data Scientists: Build classification, anomaly detection, and severity prediction models.
Data Analysts: Review historical patterns, validate model accuracy, and refine alert logic.
Project 4: Automated Data Cleaning & Structuring Pipeline Using Airflow (Google Sheets → PostgreSQL)
Build an automated ETL pipeline that extracts unstructured data from Google Sheets, cleans and structures it, and loads it into a PostgreSQL database using Apache Airflow. The workflow should standardize formats, handle missing values, validate fields, and ultimately store well-organized, analytics-ready tables.
Roles involved:
Data Engineers: Build the Airflow DAG, define the ETL workflow, and design the PostgreSQL schema.
Data Scientists: Contribute automated data-cleaning logic or anomaly-detection modules.
Data Analysts: Validate the cleaned dataset, ensure schema usability, and compare raw vs. structured outputs.
If you have any questions or would like guidance selecting a project, feel free to reach out. Looking forward to an exciting and productive Night of Code!
Thanks!