@li_ruihao presenting our work on Oneiros at SoCC'25!
KV cache drives multi-tenant LLM performance. Offloading it to the CPU is costly. Oneiros flips the script—repurposing parameter memory from inactive models into KV cache and exploiting today’s fast CPU–GPU bandwidth.
Blunt truth: LLM deployments are quietly bleeding money. A wrong config can make the same model 8× slower and 25× pricier.
With MaverIQ, we fix it by auto-deploying LLMs to meet each user’s intent—at a fraction of today’s cost
@dimliak99@prasoon_sinha25
Tianrui Hu
@NeerajaJY
My second Ph.D. student, co-advised with Lizy John, Ruihao Li, defended successfully! Ruihao is headed to Meta Research soon.
Congrats @li_ruihao, really proud of you :)
@UT_SysML@UTAustin@utexasece
A proud moment as my first co-advised PhD student, Erika Alcorta, graduates 🎓
It was an honor to hood her with Prof. Gerstlauer (@AGerstlauer). She’s now advancing ML for Systems at Ampere Computing. Excited to see all that lies ahead! 🚀
Congrats, Susy! @esalcorta@UT_SysML