I've been a backend Engineer for 12+ years. Today, I'm a Principal Engineer at Atlassian.
I've designed systems that handle millions of requests. Sat on both sides of system design interviews.
Reviewed more architecture docs than I can count.
Starting today, I'm breaking down the fundamentals of scaling for the next 25 days.
If you're learning system design bookmark this thread, you're going to get a lot of learning from this.
SQL has levels to it:
- level 1
SELECT, FROM, WHERE, GROUP BY, HAVING, LIMIT
Master these basic keywords and you’ll be well on your way to mastering SQL.
- level 2
Mastering JOINs:
Most common JOINs: INNER and LEFT
Less common JOINs: FULL OUTER
Joins you should avoid almost always: RIGHT and CROSS JOIN
Mastering common table expressions (CTEs).
The WITH keyword defines a CTE which you can imagine as a “variable” that you can query later.
Using variables like this you can master algorithm techniques like recursion, breadth first search and more!
CTEs also make your SQL much more readable and make your coworkers hate you less compared to nested sub queries.
- level 3
Mastering window functions
Window functions have 3 pieces:
The function (i.e. SUM, RANK, AVG)
The over clause to start the window
The window definition which has 3 pieces:
- how to split the window up with PARTITION BY
- how to order the window with ORDER BY
- how to restrict the window size with ROWS clause (useful for rolling monthly averages)
Understand RANK vs DENSE_RANK vs ROW_NUMBER, I have been asked this in interviews a million times.
- level 4
You understand table scans, b-tree indexes, and partitioning schemes to increase performance.
Doing something like COUNT(CASE WHEN) is much better than doing multiple queries with a UNION ALL. UNION ALL is terrible for all sorts of reasons that I don’t want to get into in this post.
B-trees indexes allow for efficient scanning of data in the WHERE clause.
Use explain plans to understand if an index is actually being used or not!
Partitioning is similar to indexes except it’s a “poor mans” index. It just keeps data in specific folders and skips the folders that don’t include the data I question.
What else did I miss for mastering SQL?
✨ I just replaced Mapbox on all my sites with OpenFreeMap by @hyperknot and my map bill is now $0
Mapbox's pricing is getting increasingly extortionary (which is fine, it's capitalism) but at some point you have to think, $857/month for what? A map? Really? A map is that expensive? How can loading a map be that expensive? It's just some PNG tiles you host somewhere? Why?
@OpenFreeMapOrg is 100% free and all you do is point your AI to openfreemap(dot)org and tell it to replace Mapbox with that
5 minutes and you save thousands $$$ per year!
Apparently @Cloudflare sponsors its bandwidth which is very cool and keeps it online!
If I had a 9–5 and had to start my indie hacker journey from scratch, this would be my exact strategy:
• Build a better version of a proven saas/app (1 core feature)
• cursor codes while you work on your 9-5
• Ship in 2–3 weeks
• Grow on X + SEO in parallel
• No code changes until first $ → only marketing
• No $ in 2 weeks? Fix marketing strategy
• No $ in 4 weeks? Pivot
Repeat until it works
Building Data Pipelines has levels to it:
- level 0
Understand the basic flow: Extract → Transform → Load (ETL) or ELT
This is the foundation.
- Extract: Pull data from sources (APIs, DBs, files)
- Transform: Clean, filter, join, or enrich the data
- Load: Store into a warehouse or lake for analysis
You’re not a data engineer until you’ve scheduled a job to pull CSVs off an SFTP server at 3AM!
level 1
Master the tools:
- Airflow for orchestration
- dbt for transformations
- Spark or PySpark for big data
- Snowflake, BigQuery, Redshift for warehouses
- Kafka or Kinesis for streaming
Understand when to batch vs stream. Most companies think they need real-time data. They usually don’t.
level 2
Handle complexity with modular design:
- DAGs should be atomic, idempotent, and parameterized
- Use task dependencies and sensors wisely
- Break transformations into layers (staging → clean → marts)
- Design for failure recovery. If a step fails, how do you re-run it? From scratch or just that part?
Learn how to backfill without breaking the world.
level 3
Data quality and observability:
- Add tests for nulls, duplicates, and business logic
- Use tools like Great Expectations, Monte Carlo, or built-in dbt tests
- Track lineage so you know what downstream will break if upstream changes
Know the difference between:
- a late-arriving dimension
- a broken SCD2
- and a pipeline silently dropping rows
At this level, you understand that reliability > cleverness.
level 4
Build for scale and maintainability:
- Version control your pipeline configs
- Use feature flags to toggle behavior in prod
- Push vs pull architecture
- Decouple compute and storage (e.g. Iceberg and Delta Lake)
- Data mesh, data contracts, streaming joins, and CDC are words you throw around because you know how and when to use them.
What else belongs in the journey to mastering data pipelines?
Our intern just asked me why we don't use Kubernetes.
I said because we don't need Kubernetes.
He said everyone uses Kubernetes.
I said everyone TALKS about using Kubernetes. Most companies are running Docker containers on three servers and calling it a day.
We have 40 employees. Our entire infrastructure runs on AWS with auto-scaling groups. It works fine.
Kubernetes is designed for companies running thousands of services across hundreds of servers. We have twelve services.
But he read that Kubernetes is "industry standard" so now he thinks we're behind.
This is what happens when people learn from tech Twitter instead of actual experience.
They think every company is Google-scale and needs Google-scale solutions.
We don't need Kubernetes. We need our MySQL database to stop running out of connections because someone wrote a query that doesn't close properly.
But that's not exciting. Nobody writes blog posts about "I fixed a connection leak."
They write about "How we migrated to Kubernetes and saved millions" even though the migration cost more than they saved.
I told the intern he should learn why tools exist before learning the tools themselves.
He looked disappointed. He wanted to put Kubernetes on his resume.
@bil0090@steipete Crazy journey, but the lesson is clear: build fast, listen to users, and don’t quit when it hurts.
Execution + timing + obsession with real demand wins every time.
Offer it for free. See what problems they experience with it. Create a fix for each of those problems.
Now create a feature that allows people to use it for free for a month if 2 of their friends sign up either for the free version (by sending it two of their friends) or the paid version.
I mean this product needs to grow as fast as possible.
Caveat: Only do this last part with a product that has proven it delivers value otherwise it will backfire quickly.
My CISO called me at 3 AM last Tuesday.
"We caught someone."
I asked, "Caught them doing what?"
He said, "Typing."
Let me explain.
We have an employee in IT. Great worker. Always online. Never complained. Perfect Slack etiquette.
One problem.
His keystrokes were arriving 110 milliseconds late.
One hundred and ten milliseconds.
That's 0.11 seconds.
The average American remote worker has 20-40ms of latency.
This guy? 110ms. Every. Single. Keystroke.
My security team ran the numbers.
That latency doesn't come from a bad router in Ohio.
That latency comes from Pyongyang.
Our "Senior DevOps Engineer" was a North Korean operative.
Running his work laptop through a laptop farm.
In America.
While he worked from a government building.
In North Korea.
He passed the interview. He passed the background check. He passed the vibe check.
He did not pass the speed of light.
Here's what people don't understand about physics:
Light travels 186,000 miles per second.
But it still has to go through China.
And China adds latency.
Since April, Amazon has caught 1,800 of these attempts.
Eighteen hundred.
I called an emergency meeting with my board.
I said, "We need to implement Keystroke Velocity Auditing across all remote employees."
They said, "That sounds invasive."
I said, "You know what else is invasive? The Democratic People's Republic of Korea in your Jira tickets."
They approved the budget.
We now monitor keystroke timing to the microsecond.
If your latency exceeds 60ms, you get a call from HR.
If it exceeds 100ms, you get a call from the FBI.
We've already flagged 47 employees.
Turns out 44 of them just have bad Wi-Fi.
3 of them are "still under investigation."
The lesson?
You can fake a resume.
You can fake a background check.
You can fake an American accent on Zoom.
But you cannot fake the speed of light.
Physics is the ultimate background check.
Hire accordingly.
🛡️ React Native Security Rule #6 :
AsyncStorage is NOT secure storage.
If you save tokens, passwords, or credentials there, you’re storing them in plaintext. On rooted/jailbroken devices, attackers can dump it in seconds.
AsyncStorage is fully readable.
Treat it as public, not private.
Never store:
• auth tokens
• passwords
• API secrets
Use instead:
• expo-secure-store
• react-native-keychain
• native Keychain / Keystore
One thing I’ve noticed about people who say they “don’t code much anymore.”
Most of them have been doing this for YEARS.
They already understand systems.
They already understand databases.
They already understand how things break.
So when they use AI, they know exactly what to ask and what to fix.
The problem is beginners see this and think they can skip the fundamentals.
That’s where it goes wrong.
AI doesn’t replace understanding.
It amplifies it.
If you don’t know how things work, you won’t even know when the AI is wrong.
You’re building an app.
You don’t want to manage servers.
You don’t want to deal with auth.
You don’t want to write APIs.
You choose Supabase.
Postgres is ready.
Auth works.
Storage works.
Realtime works.
You ship fast.
It feels like the backend is gone.
Then the app grows.
More users.
More data.
More features that actually matter.
And production starts behaving oddly.
Some queries return empty results with no obvious error.
Row Level Security blocks data you expect to see.
Realtime subscriptions feel slower as usage increases.
One inefficient query suddenly affects the entire app.
What’s actually happening isn’t mysterious.
You didn’t remove the backend.
You moved it.
Supabase is not “no backend”.
It’s:
• A managed PostgreSQL database
• Exposed directly to the client
• With access control enforced inside the database
That design choice changes how everything works.
In a traditional setup:
Client → Server → Database
Your server handles auth, validation, and business logic.
With Supabase:
Client → Database
The database takes on those responsibilities.
That means your database becomes:
• The API your frontend talks to
• The place where permissions are enforced
• The layer where business rules live
• The main factor in performance
Security now lives in Row Level Security policies.
Every query is filtered by those rules.
If a policy is wrong or inefficient, queries fail or slow down quietly.
Logic moves into SQL, functions, and triggers.
Instead of fixing a route handler, you’re debugging database behavior.
Performance still works the same way it always has.
Indexes matter.
Joins matter.
Schema design matters.
Supabase removes infrastructure work.
You don’t manage servers or deployments.
But it doesn’t remove database work.
You still have to:
• Understand how Postgres executes queries
• Design tables and relationships carefully
• Think about access patterns from the client
• Watch for slow or expensive queries
Supabase makes starting easier.
The rest is still on you.
@oprydai Skill alone isn’t enough—visibility, connections, and collaboration amplify talent. Build obsessively, yes, but don’t let the world miss out on what you can create.