Overload control is usually built around a bad assumption. Most systems watch global signals like queue length or tail latency and react at the front door by throttling new arrivals or dropping random requests. This works when CPU or network is the bottleneck. It fails when the real problem sits deeper inside the application. In practice, many overload incidents come from one or two requests that monopolize an internal logical resource like a buffer pool, a lock, or a thread-pool queue. These rogue whales distort the system. One ill-timed dump query can thrash the buffer pool and cut throughput in half. One backup thread plus a heavy table scan can stall writes in MySQL. None of this shows up in CPU metrics.
The Atropos paper (SOSP’25) offers a simple fix. Instead of punishing victims at admission time, Atropos watches how tasks actually use these internal resources and cancels the ones causing the collapse.
The key observation is that most real systems already have safe cancellation hooks. What’s been missing is a principled way to decide whom to cut. Atropos supplies that missing piece and shows that killing a single rogue whale can unblock the whole system.
After six wonderful years at UIUC @siebelschool, I’m thrilled to announce that I will join the University of Toronto ECE @eceuoft as an assistant professor in July 2026. I’m actively looking for PhD students in fall 2026. Drop me an email if you are interested in working with me.
🔍 Meet TrainCheck, an open-source tool from #UMich CSE led by Prof. @ryanphuang that catches silent errors in deep learning—bugs that quietly wreck model quality. It found 18/20 real-world errors in 1 iteration + 6 unknown bugs.
▶️https://t.co/yoVsxA3krS
#TrainCheck
Congratulations to the authors of Basilisk: Using Provenance Invariants to Automate Proofs of Undecidable Protocols on receiving a ⭐ Best Paper Award ⭐ at OSDI 2025! 👏
🔗 Read more: https://t.co/ix3AdCTaLk
#OSDI2025#SystemsResearch#BestPaper#CSEMichigan
🚨 New resource for ML systems folks!
We release a curated reading list on ML reliability, including silent errors, testing, fault tolerance, and more.
If you’re building more robust, debuggable ML systems, this list is for you 👇
🔗 https://t.co/zVQA3O9sAv
PRs are welcome!
🔍 T2C: Turns system tests into semantic checkers to detect failures in distributed systems.
* [Paper](https://t.co/Wk7NXx6w2l), [Code](https://t.co/6LKxc6KoSz)
#DistributedSystems
🚨 Excited to share that our group will present two papers at OSDI '25 next week!
🎯 TrainCheck: Automatically catches silent errors during deep learning training by inferring and enforcing training invariants.
* https://t.co/JJwSecWiql, [Code]( https://t.co/qjWgqqEs7h)
Come join the Artifact Evaluation Committee for #osdi25 & @usenixtatc25, and help promote reproducibility in our Systems community.
Ryan (@ryanphuang) and Tianyin (@tianyin_xu) will make this process fun, smooth, and lightweight (1-2 artifacts per reviewer).
We are seeking members to join the Artifact Evaluation Committee for #OSDI25 & #USENIXATC25.
Help promote reproducibility, and engage with cutting-edge systems research!
Please apply by April 17th via https://t.co/OqY9nFLhmI.
#SystemsResearch#Reproducibility#OSDI#ATC
EuroSys'23 program is now out! Preview the exciting lineup of talks at https://t.co/XwCUGir59Z.
Early bird registration ends on April 15th. More details - https://t.co/SwxZWky4yH
Wow, @UMichCSE's recent faculty hires are on 🔥🔥!
They got 7⃣—count 'em, seven!—NSF CAREER awards this cycle.
Big congratulations to @mahdi_tcs, @royaensafi, @pag_crypto, Euiwoong Lee, @neurocy, @eig, and @xwangsd.
https://t.co/t8kz2ZRpr9
Thank you @SloanFoundation for recognizing my lab research. A heartfelt thanks to my students, mentors, friends in civil society, FOCI and @OpenTechFund communities for their support and inspiration which have allowed me to push the limits without any fear. #SloanFellow
Congratulations to @UCSanDiego@ucsd_cse cybersecurity expert Stefan Savage, who was elected to @theNAEng! His work looks at everything from cars, to spam email, to cryptocurrencies: https://t.co/gU50Wl1qzi