your startup is just an openai wrapper
openai is just an nvidia wrapper
nvidia is just a tsmc wrapper
tsmc is just an asml wrapper
asml is just a sand wrapper
This is weird to say out loud, but I actually am kinda an expert in rate limiting, so I'm gonna explain some stuff.
About half of incidents in large-scale production systems involve having more requests than you can serve. There are two categories of this kind of incident: