posh the engine. @poshlovesdata - Twitter Profile

nice architectural plan bro 👏 some advice i can give regarding this: when you pull from an external api, your first move should always be dumping the raw response into an object store like s3 or even a local json file and why is that ? tbh if you try to clean the data on the fly and your logic fails on record 9,999 out of 10,000, that means you have to ping their api server again, doing that on a paid api is going to cost you money because you're repeatedly hitting their server everytime an error comes up in your code. so always decouple your extraction from your transformation. get the data safe on your storage first, then run your transformation. extract -> load raw -> transform

OLAJIDE @jickson234

3 months ago

I’ve mostly worked with static datasets in Excel and Power BI. Now I’m taking the next step, working with real-time data. My next focus: API → Python → SQL → Power BI Instead of downloading datasets, I want to: - Pull data directly from APIs - Clean and transform it using Python - Store and query it with SQL - Build dashboards in Power BI The goal is to move closer to how data is actually handled in real-world scenarios. Still learning, but excited to build this end-to-end workflow. If you’ve worked with APIs before, I’d appreciate any tips or resources. @chidirolex @_VictorUgwu @Smanmalik83 @iam_daniiell @ObohX #DataAnalytics #PowerBI #Python #SQL

jickson234's tweet photo. I’ve mostly worked with static datasets in Excel and Power BI. Now I’m taking the next step, working with real-time data.

My next focus:
API → Python → SQL → Power BI

Instead of downloading datasets, I want to:
- Pull data directly from APIs
- Clean and transform it using Python
- Store and query it with SQL
- Build dashboards in Power BI

The goal is to move closer to how data is actually handled in real-world scenarios.

Still learning, but excited to build this end-to-end workflow.

If you’ve worked with APIs before, I’d appreciate any tips or resources. @chidirolex @_VictorUgwu @Smanmalik83 @iam_daniiell @ObohX
#DataAnalytics #PowerBI #Python #SQL

2

16

3

6

1K

1

6

2

0

230

posh the engine.

@poshlovesdata

3 months ago

@humayun_x fr, a model is only as good as the data feeding it anyway

1

2

0

19

poshlovesdata retweeted

ABC

@Ubunta

3 months ago

You should never vibe code mission critical Data Engineering applications. - Not the pipeline that feeds your regulatory submission. - Not the transformation that calculates patient dosing. - Not the reconciliation logic your finance team signs off on. Use AI to build it. Absolutely. But do not use AI to review it for you. That's the human expert's job — and it's non-negotiable. The code runs. The tests pass. The output looks plausible. That's the danger. Let AI accelerate the build. But the review? That's where domain expertise earns its keep.

1

4

3

2K

posh the engine.

@poshlovesdata

3 months ago

@Ubunta ran into this exact thing building a flight ticketing pipeline. used AI to write some dbt tests for NUC amounts based on standard industry logic. tests passed, but the company’s internal logic was different. silent failures are the worst 😭

0

26

posh the engine.

@poshlovesdata

3 months ago

@JDataCraft @DabereNnamani @Dare_0x @TDataImmersed @ThePSF @JA_Olaoye well done bro 👏🏾

1

0

27

posh the engine.

@poshlovesdata

3 months ago

@temivalentine_ good stuff 👏🏾 design’s sleek for a streamlit app

1

0

34

posh the engine.

@poshlovesdata

3 months ago

how do I tell you that airbuds just removed the restriction on Nigerians. This guys are crazyyy

vergil’s yamato @sinennn000

3 months ago

i need you guys to promise me something this thing’s hard to build. like really hard. and yes it’s gonna be completely free, idc. i just need y’all to promise me you’re gonna use it and put your friends on and use it. not download and keep, actually use and post about it thanks

sinennn000's tweet photo. i need you guys to promise me something
this thing’s hard to build. like really hard. and yes it’s gonna be completely free, idc.
i just need y’all to promise me you’re gonna use it and put your friends on and use it. not download and keep, actually use and post about it
thanks https://t.co/gx5bhNbH7V

14

44

5

3

4K

2

1

0

112

poshlovesdata retweeted

Mide💜 @_midee1

3 months ago

i’m a Social Media Manager and Data Analyst currently open to remote opportunities. i’d appreciate any referrals or recommendations.🫢

2

14

5

1

565

posh the engine.

@poshlovesdata

3 months ago

Phase 3:🥳 To ensure data is being uploaded to where it can be accessed from any location, once an internet connection comes up at the remote health clinic. What did i do? > Provisioned an S3 bucket for the data uploads > Created a python script that checks for internet connection every 10 seconds > Once a connection is available, it pushes the parquet file from the outbox/ folder to the S3 bucket. > Then moves the file from outbox/ to uploaded/ locally once its sure that the data is now available on the S3 bucket. And there you have it, a simple, reliable offline-first data pipeline that works even with intermittent connectivity. Will be documenting this project and pushing it to GitHub next. If you're interested in the workflow, you can access it there.

poshlovesdata's tweet photo. Phase 3:🥳

To ensure data is being uploaded to where it can be accessed from any location, once an internet connection comes up at the remote health clinic. What did i do?

> Provisioned an S3 bucket for the data uploads
> Created a python script that checks for internet connection every 10 seconds
> Once a connection is available, it pushes the parquet file from the outbox/ folder to the S3 bucket.
> Then moves the file from outbox/ to uploaded/ locally once its sure that the data is now available on the S3 bucket.

And there you have it, a simple, reliable offline-first data pipeline that works even with intermittent connectivity.

Will be documenting this project and pushing it to GitHub next. If you're interested in the workflow, you can access it there.

posh the engine.

@poshlovesdata

3 months ago

Phase 2:👀 To prepare the data in the SQLite DB to be sent in a compressed format (parquet), what did I do? 1. Created a python script that : > Checks the table for unsynced records (records where the sync_status is in 'pending') > Converts just those records into parquet using Pandas > Then updates the converted records (using the record_id) sync_status to 'synced' in the DB. Why did I do this? So, when the script runs again, it only looks for records that haven't been synced yet. Now I have lightweight data that can be sent over a minimal internet connection. What's next? > I'll be creating an uploader script that polls for an internet connection every 10 seconds. Once a connection is confirmed, it uploads the parquet file into an already provisioned S3 bucket. #DataEngineering #ETL #Python #DataAnalytics

poshlovesdata's tweet photo. Phase 2:👀

To prepare the data in the SQLite DB to be sent in a compressed format (parquet), what did I do?
1. Created a python script that :
> Checks the table for unsynced records (records where the sync_status is in 'pending')

> Converts just those records into parquet using Pandas

> Then updates the converted records (using the record_id) sync_status to 'synced' in the DB.
Why did I do this?
So, when the script runs again, it only looks for records that haven't been synced yet.

Now I have lightweight data that can be sent over a minimal internet connection.

What's next?
> I'll be creating an uploader script that polls for an internet connection every 10 seconds. Once a connection is confirmed, it uploads the parquet file into an already provisioned S3 bucket.

#DataEngineering #ETL #Python #DataAnalytics

0

5

0

1

402

1

3

2

148

poshlovesdata retweeted

Kaxil Naik

@kaxil

3 months ago

The Apache Airflow Registry is live: a searchable catalog of 98 providers and 1,600+ modules (operators, hooks, sensors, triggers, transfers). Cmd+K instant search, connection builder, JSON API, auto-updates on new releases. https://t.co/brXpfXkNAi

kaxil's tweet photo. The Apache Airflow Registry is live: a searchable catalog of 98 providers and 1,600+ modules (operators, hooks, sensors, triggers, transfers).

Cmd+K instant search, connection builder, JSON API, auto-updates on new releases.

https://t.co/brXpfXkNAi

1

36

6

23

2K

posh the engine.

@poshlovesdata

3 months ago

@_justmba crazy translation 😭😂

0

8

posh the engine.

@poshlovesdata

3 months ago

phase 2: https://t.co/AcBsvCMgcv

posh the engine.

@poshlovesdata

3 months ago

Phase 2:👀 To prepare the data in the SQLite DB to be sent in a compressed format (parquet), what did I do? 1. Created a python script that : > Checks the table for unsynced records (records where the sync_status is in 'pending') > Converts just those records into parquet using Pandas > Then updates the converted records (using the record_id) sync_status to 'synced' in the DB. Why did I do this? So, when the script runs again, it only looks for records that haven't been synced yet. Now I have lightweight data that can be sent over a minimal internet connection. What's next? > I'll be creating an uploader script that polls for an internet connection every 10 seconds. Once a connection is confirmed, it uploads the parquet file into an already provisioned S3 bucket. #DataEngineering #ETL #Python #DataAnalytics

0

5

0

1

402

0

34

posh the engine.

@poshlovesdata

3 months ago

Proof of Concept:😶‍🌫️ So i started working on this as a project, what I've done: 1. Initialized an SQLite Database and created a table called patient_vitals to store patients vitals. 2. Built a Python script to generate and load data into the SQLite database 3. Automated the data generation and loading using cronjob, which currently runs every minute (would eventually change it to 10 mins) to simulate actual data entry in an health facility. What next? To prepare the data into highly compressed payloads (parquet) so it's ready the millisecond internet is available in the health facility.

poshlovesdata's tweet photo. Proof of Concept:😶‍🌫️
So i started working on this as a project, what I've done:

1. Initialized an SQLite Database and created a table called patient_vitals to store patients vitals.

2. Built a Python script to generate and load data into the SQLite database

3. Automated the data generation and loading using cronjob, which currently runs every minute (would eventually change it to 10 mins) to simulate actual data entry in an health facility.

What next?
To prepare the data into highly compressed payloads (parquet) so it's ready the millisecond internet is available in the health facility.

posh the engine.

@poshlovesdata

3 months ago

I'll definitely use an offline first approach, and here's how I'd do it: 1. Store the data locally in a light weight DB like Sqlite. 2. Run a cron job to regularly batch and compress that data into Parquet. 3. Another script that polls for internet connection, so once an Internet connection comes up, it uploads the parquet into remote storage like an S3 Bucket 4. From there, S3 event notifications can trigger the ingestion pipeline to deduplicate and model the data.

1

37

7

14

3K

2

10

1

10

2K

posh the engine.

@poshlovesdata

3 months ago

Phase 2:👀 To prepare the data in the SQLite DB to be sent in a compressed format (parquet), what did I do? 1. Created a python script that : > Checks the table for unsynced records (records where the sync_status is in 'pending') > Converts just those records into parquet using Pandas > Then updates the converted records (using the record_id) sync_status to 'synced' in the DB. Why did I do this? So, when the script runs again, it only looks for records that haven't been synced yet. Now I have lightweight data that can be sent over a minimal internet connection. What's next? > I'll be creating an uploader script that polls for an internet connection every 10 seconds. Once a connection is confirmed, it uploads the parquet file into an already provisioned S3 bucket. #DataEngineering #ETL #Python #DataAnalytics

posh the engine.

@poshlovesdata

3 months ago

Proof of Concept:😶‍🌫️ So i started working on this as a project, what I've done: 1. Initialized an SQLite Database and created a table called patient_vitals to store patients vitals. 2. Built a Python script to generate and load data into the SQLite database 3. Automated the data generation and loading using cronjob, which currently runs every minute (would eventually change it to 10 mins) to simulate actual data entry in an health facility. What next? To prepare the data into highly compressed payloads (parquet) so it's ready the millisecond internet is available in the health facility.

2

10

1

10

2K

0

5

0

1

402

posh the engine.

@poshlovesdata

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users