I get this weird feeling that I’ve lost control of my code when I use tools like @v0 or @Copilot Agent
they’re great for prototyping, but I prefer starting from scratch and implementing the features myself
ANNOUNCING THE FIRST EVER CLONEATHON
Build an open source clone of T3 Chat, win up to $5,000
Deadline is next Wednesday. Good luck and have fun nerds 🫡
Coming from Golang, backend development with JavaScript just feels unnecessarily complicated in some areas.
I remember my professor always saying: “Go is all about ease of use” — and now I really get it.
@NetworkChuck It depends on how they use it. Students who treat AI as a learning tool will absolutely get smarter. But if it’s just a shortcut to avoid thinking, they’ll probably get dumber.
@theo A status page is needed; it was the Discord community that alerted me to the outage. I was receiving an annoying popup asking me to wait while my data was migrating to the new db.
T3 Chat has recovered and is now working again.
Now for ACCOUNTABILITY POSTING 2.0.
Outages like this are unacceptable. Tens of thousands of people rely on T3 Chat every day, and we need to make sure our service is reliable. We also need better safety nets in place for when issues occur.
I'm going to talk about the upstream provider that triggered the issue. Please treat this as TRANSPARENCY and not BLAME SHIFTING. Blame us!!!
We've been working hard to move over to Convex as our data layer and sync engine for T3 Chat. This might seem like a "database swap" but it goes much deeper. It's effectively a full rewrite of T3 Chat.
After a lot of effort and 3 failed migrations, we finally had a successful move at around 8pm last night. That was during our lowest traffic window (~40% of our peak traffic).
All looked good. I was pumped. A month of effort, finally shipped. I literally slept for 12 hours. Woke up to utter chaos.
The tl;dr is that a traffic spike took down their websocket connection layer, and some bad client code from their React package caused a reconnect loop that effectively DDOS'd the Convex endpoint.
Convex will have a detailed write up in the near future, but I want to talk about what we're doing going forward.
1. Actual status updates and reporting in-app
Right now, outages are reported via me via Twitter. We're a real app now. You shouldn't have to follow me to know what's going on.
We'll be introducing a status page soon to make things clearer
2. Paging system for when outages occur
Right now, we're too reliant on the community for tracking outages. I love that y'all DM me when issues occur, but that doesn't help when I am asleep.
We need better methods to report outages so Mark and I get woken up and can fix things faster.
Side note: I hate PagerDuty, so I'd love suggestions on what we can use instead.
3. Automated "refresh to latest" flow on client
A lot of the issue we had today was caused by a bad client side package DDOSing Convex. Even when we pushed a fix, lots of users were on the old version still, and would stay on that old version until refreshing.
We have a "please refresh" button, but that's not enough. If an old client can connect, we need the ability to disconnect it. This will be an annoying tech overhaul with a lot of potential edge cases, but it is necessary for us to assure stability.
4. Evaluate all upstream providers to make sure they are prepared for T3 Chat's load
I deeply love Convex and I know they're the right database for what we're building with T3 Chat. Outages like this still scare the shit out of me. I need to seriously evaluate them and everyone else we rely on to make sure we won't have more problems as we continue to scale up.
Anyways...
This sucks. Seriously. I hate outages so much. You guys use T3 Chat because it's the best chat app ever. Outages make it the worst chat app ever.
Know we're taking this as seriously as possible. I expect to have a few more sleepless nights as we get everything in order to be more resilient.
I'm sorry.