https://t.co/Lgrd48mLWi (USA's official website) launched in 2013. 250,000 users hit it simultaneously.
Every request was processed synchronously. The system could handle about 1,000.
It crashed within minutes. Stayed broken for weeks. Cost $600 million in emergency fixes.
One architectural pattern could have prevented the worst of it. 👇
I've started a 25-day series on Scaling and Architecture.
One topic per day.
After 12+ years as a backend engineer, I can tell you the most overlooked scaling technique:
It's partitioning. The thing you do to a 500 million row table that makes queries 50x faster without changing a single line of application code.
Day 10: Partitioning.
Thread below 👇
I've started a 25-day series on Scaling and Architecture.
One topic per day.
In 12+ years of backend engineering, I've seen databases crash, data disappear, and teams scramble at 12 AM trying to figure out why the replica is 47 seconds behind the primary.
Day 1: Load Balancing
Day 2: CDN
Day 3: Caching
Day 4: Cache Invalidation
Day 5: Rate Limiting
Day 6: API Gateway
Day 7: CAP Theorem
Day 8: Sharding
Today is Day 9: Replication.
Your database lives on one machine. That machine will die someday. Replication is how you make sure your data survives it.
Thread below 👇
I've started a 25-day series on Scaling and Architecture.
One topic per day.
Day 1: Load Balancing
Day 2: CDN
Day 3: Caching (the 5 layers)
Day 4: Cache Invalidation
Day 5: Rate Limiting
Day 6: API Gateway
Day 7: CAP Theorem
Today is Day 8: Sharding.
This is the scaling decision you can't easily undo. Get the shard key wrong and you'll be resharding everything 18 months later. Get it right and your database scales horizontally for years.
Thread below 👇
I've started a 25-day series on Scaling and Architecture.
One topic per day.
Day 1: Load Balancing
Day 2: CDN
Day 3: Caching (the 5 layers)
Day 4: Cache Invalidation
Day 5: Rate Limiting
Day 6: API Gateway
Today is Day 7: CAP Theorem.
The most quoted and most misunderstood concept in distributed systems. Most people get it wrong in interviews. I'm going to explain it the way I wish someone explained it to me 10 years ago.
Thread below 👇
I've started a 25-day series on Scaling and Architecture.
One topic per day.
Day 1: Load Balancing
Day 2: CDN
Day 3: Caching (the 5 layers)
Day 4: Cache Invalidation
Day 5: Rate Limiting
Today is Day 6: API Gateway.
If load balancing is how you distribute traffic and rate limiting is how you control it, the API gateway is the front door that decides what gets in, how it gets routed, and what happens before your backend ever sees it.
Thread below 👇
I've started a 25-day series on Scaling and Architecture.
One topic per day.
Day 1: Load Balancing
Day 2: CDN
Day 3: Caching (the 5 layers)
Day 4: Cache Invalidation
Today is Day 5: Rate Limiting.
Your system has a breaking point. Every system does. Rate limiting is how you find it on your terms instead of your users finding it for you.
Three algorithms. Four layers. And the story of how our own batch job took down our API.
Thread below 👇
Day 3 of the Scaling & Architecture series.
Today: Caching.
I'll discuss the actual 5 layers where caching happens in a production system.
Most engineers know maybe 2 of them. The other 3 are doing work behind the scenes that you've probably never configured and thought about.
One of them is invisible. Your OS is doing it right now without you asking.
Thread below 👇
I've been a backend Engineer for 12+ years. Today, I'm a Principal Engineer at Atlassian.
I've designed systems that handle millions of requests. Sat on both sides of system design interviews.
Reviewed more architecture docs than I can count.
Starting today, I'm breaking down the fundamentals of scaling for the next 25 days.
If you're learning system design bookmark this thread, you're going to get a lot of learning from this.
I've started a 25-day series on Scaling and Architecture.
One topic per day.
As a Principal Backend Engineer with 12+ years of building systems at scale, I want to break down every concept I wish someone explained to me earlier in my career.
Day 1 was Load Balancing.
Today is Day 2: CDN.
Follow along if you're serious about system design. This will be worth your time.
Before going for LLD round , never forget to prepare how to :
Design a Parking Lot
Design an elevator system
Design API rate limiter
Design a logging system
Design a hotel management system
Design a movie ticket booking system
Building a collaborative real-time text editor is an engineering trap.
Most developers think it is just passing websockets back and forth. Then two users type at the same millisecond, the document state diverges, and you are stuck in a distributed systems nightmare.
Here is how modern apps handle concurrency without losing data:
Most engineers learn system design backwards.
They jump to Kubernetes before they understand what a network packet even does.
Here’s the order that actually makes you dangerous:
1. Networks first
HTTP. TCP. DNS. Latency vs throughput.
This is the part nobody studies.
This is like trying to bench 300lbs without learning to squat.
2. Databases second
SQL vs NoSQL, indexes, replication, and partitioning.
If you can’t reason about data -> you can’t reason about scale.
3. Caching
Redis, CDNs, TTLs, eviction policies.
70% of scaling wins come from avoiding queries.
4. Queues & Streams
Kafka, RabbitMQ, SQS.
This is how you decouple timelines and handle spikes without blowing up servers.
5. Load Balancing
Round robin vs least connections vs consistent hashing.
You understand how to scale horizontally without chaos.
6. Build 5 classic designs yourself
- URL shortener
- Rate limiter
- Chat app
- Feed system
- Notifications
7. Read real-world post-mortems
Real learning is failure exposure.
You see what broke. You see WHY.
You don’t become good at system design by memorizing diagrams.
You become good by understanding the physics of distributed systems.
Latency. Durability. Throughput. Availability. Cost.
Those 5 forces rule everything.
Raul is one of the most knowledgeable experts on system design & software architecture on X.
Highly recommend following him if you're looking to improve your knowledge.
Today we reduced headcount by 22%. The business is the strongest it's ever been. So I think it's important to be direct about what I'm seeing and why.
First, I made this decision and I own it. I did it because the way to operate at the highest level of productivity is changing, and to win the future, ClickUp needs to change with it.
Second, this wasn't about cutting costs. Most savings from this change will flow directly back into the people who stay. We'll be introducing million-dollar salary bands. If you create outsized impact using AI, you'll be paid outside of traditional bands.
Most importantly, I have the deepest gratitude for those affected. We're doing this from a position of strength specifically so we can take care of people properly. Everyone affected receives a package aimed at honoring their contributions and easing the transition.
I only see two options: wait for this to play out gradually in the market or be honest about what I'm seeing and act proactively.
THE 100X ORGANIZATION
The primary change is that we're restructuring around what I call 100x org. The goal is 100x output. The roles required to build at the highest level are fundamentally different than they were a year ago.
Incremental improvements to existing systems won't get us there. We need new ones. That means creating enough disruption to rebuild rather than iterate on what's already broken.
The common narrative is that AI makes everyone more productive. It doesn't. Many of the workflows of today, if left unchanged, create bottlenecks in AI systems.
These roles will evolve. But waiting for that to happen naturally means falling behind now.
The 100x org is actually heavily dependent on people - infinitely more than today. This is only possible with 10x people that have embraced and adopted new ways of working.
THE BUILDERS, AGENT MANAGERS, AND FRONT-LINERS
— THE BUILDERS: 10X ENGINEERS
I don't think most companies have internalized what's actually happening with AI in engineering. The common narrative is that AI makes all engineers more productive. That may be true in isolation, but at an organization level - that is the farthest thing from reality.
Here's what we've validated recently at ClickUp: the great engineers, the ones who can orchestrate, architect, and review, are becoming 100x engineers. They're not writing code. They're directing agents that write code. The skill is judgment.
AI makes the best engineers wildly more productive, and everyone else using AI slows these engineers down.
Think about it - the bottlenecks are (1) orchestration - telling AI what to do, and (2) reviewing - what AI did. Everything is leapfrogged and no longer needed.
So who do you want orchestrating and reviewing code?
And how do you want your best engineers to spend their time?
If your best engineers are spending time reviewing other people's code, then this is inherently an inefficient bottleneck. These engineers can review their agent's code much faster than reviewing human code.
The new world is about enabling your 10x engineers to become 100x.
The wrong strategy is to push every engineer to use infinite tokens. Companies doing this are celebrating 500% more pull requests. But customer outcomes don't match the volume of code being generated.
I call this the great reckoning of AI coding, and every company will face this soon if not already.
More code is just another bottleneck to the best engineers, and ultimately to your company's impact as well.
— THE BUILDERS: 10X PRODUCT MANAGERS
Product management and design roles are merging.
Designers that have customer focus, become more like product managers.
And product managers that have intuition for UX become more like designers.
The bottleneck of user research is gone. It takes us just one mention of an agent to kickoff research and analyze results.
The bottleneck of product <> design iteration is also gone. The product builder iterates on their own, along with agents and skills that ensure alignment with quality and strategy.
Also controversial today - I believe that the wrong strategy is to have your PMs shipping code - that just introduces another bottleneck that the best engineers will waste their time on.
To be clear, PMs should be coding but they should do this in a playground to iterate, validate, and scope. That code should not go to production.
Everything outside of managing systems, orchestrating AI, and reviewing output becomes a bottleneck.
That's why the other roles that are critical along with these are the systems managers (to reduce bottlenecks) along with a bottleneck you can't replace - customer meeting time.
— THE SYSTEM MANAGERS
Ironically, the people that automate their jobs with AI will always have a job. They become owners of the AI systems - agent managers. We have many examples of these people at ClickUp.
The underlying systems in which we operate are absolutely critical to get right. I think most companies are delusional to think they can iterate on existing systems and compete in this new world.
You must create enough disruption so that old systems are deprecated entirely. If there's any definition for 'AI native' that's what it is.
— THE FRONT-LINERS
In a world that will become saturated with AI communication, the human touch will matter more than anything to customers.
This is a bottleneck that you shouldn't replace - even when agents are high enough quality to do video meetings.
One-on-one meeting time with customers is something that shouldn't be automated. The systems around the meetings should be - so that front-liners spend nearly 100% of their time with customers.
REWARDING 100X IMPACT
In a world where companies are able to do so much more with less, where does that excess money go?
In our case, much of the savings in this new operating model will flow directly back to those that enabled it.
We must reward people that create productivity accordingly. This aligns incentives on both sides. Plus, in a world where your best people create 100x impact, you can't afford to lose them.
You should aim to retain these employees for decades. The context they have and their ability to efficiently orchestrate and review will be nearly impossible to replace.
Compensation bands of today should be thrown out the door. We're introducing $1 million cash/year salary bands with a path available to nearly everyone in the company if they produce 100x impact by creating or managing AI systems.
THE FUTURE
Nearly every company will make changes like these. The ones that do it proactively will define what comes next.
The future is not fewer people. It's different work, new roles, and better rewards for those who embrace it. We're already seeing entirely new roles emerge, like Agent Managers, that didn't exist a year ago.
ClickUp is positioning to lead this shift, not just internally, but for our customers too. I've never been more certain about where we're headed.