Presenting in short!
👉🏼 Mol-MoE: leveraging model merging and RLHF for test-time steering of molecular properties.
📆 today, 11:15am to 12:15pm
📍 Poster session #1, GEM Bio Workshop
@gembioworkshop@sparseLLMs#ICLR#ICLR2025
We are presenting “Prefix and output length-aware scheduling for efficient online LLM inference” at the ICLR 2025 (@iclr_conf) Sparsity in LLMs workshop (@sparseLLMs).
🪫 Challenge: LLM inference in data centers benefits from data parallelism. How can we exploit patterns in requests – like shared prefixes and variable decode length – to optimally assign requests to GPU workers?
💡 Idea: both prefix and output length-aware scheduling!
We build on Preble (ICML 2025, @vikranth22446, @yiying__zhang), which was the first distributed LLM serving system to exploit prompt sharing (see https://t.co/BpHl8PbbfQ).
In our proof-of-concept work, we carefully benchmark Preble vs. prefix-unaware schedulers to identify opportunities for performance improvement.
⭐️ By adding output length-aware scheduling to Preble, we reduce latency by 14.31% at 64 RPS and 28.89% at 128 RPS. ⭐️
📖 Full paper here: https://t.co/1iW6hlfK7g
Thank you to co-authors @InakiArango, @YepHuang, @rana_shahout, and @minlanyu at @hseas. Thank you also to the Preble authors for their groundbreaking work!
our workshop on sparsity in LLMs is starting soon in Hall 4.7! we’re starting strong with an invited talk from @DAlistarh and an exciting oral on scaling laws for MoEs!
Our ICLR 2025 Workshop on Sparsity in LLMs (@sparseLLMs) kicks off with a talk by @DAlistarh on lossless (~1% perf drop) LLM compression using quantization across various benchmarks.
a PACKED hall for @tydsh‘s talk at our sparsity in LLMs workshop -not surprising! we have another oral right after this, and then we’ll have the first of 2 poster sessions before lunch! @iclr_conf
Sparse LLM workshop will run on Sunday with two poster sessions, a mentoring session, 4 spotlight talks, 4 invited talks and a panel session.
We'll host an amazing lineup of researchers: @DAlistarh@vithursant19@tydsh @ayazdanb @gkdziugaite Olivia Hsu @PavloMolchanov Yang Yu
I will travelling to Singapore 🇸🇬 this week for the ICLR 2025 Workshop on Sparsity in LLMs (SLLM) that I'm co-organizing!
We have an exciting lineup of invited speakers and panelists including @DAlistarh, @gkdziugaite, @PavloMolchanov, @vithursant19, @tydsh and @ayazdanb.
Check out this post that has information about research from Apple that will be presented at ICLR 2025 in 🇸🇬 this week.
I will be at ICLR and will be presenting some of our work (led by @samira_abnar) at SLLM @sparseLLMs workshop.
Happy to chat about JEPAs as well!
🔹 Mentor sign-up: fill out https://t.co/wY3rTvnlgj (Extended deadline: April 11th, 2025)
🔹Mentee sign-up: fill out https://t.co/0tmZJ8uKvE (Deadline: April 11th, 2025)
🔊Exciting Mentorship Opportunity @iclr_conf!
We’re thrilled to announce that mentee applications are now open for our mentorship program at the Sparse-LLM workshop at @iclr_conf!
Details 👇
This is a unique chance for young researchers to connect with senior mentors for guidance on research challenges, collaborations, publishing, and more.
📅 Dedicated mentorship session during the workshop + networking opportunities over coffee & lunch breaks!
Our QuEST paper was selected for Oral Presentation at ICLR @sparseLLMs workshop!
QuEST is the first algorithm with Pareto-optimal LLM training for 4bit weights/activations, and can even train accurate 1-bit LLMs.
Paper: https://t.co/ulbX5D5LjD
Code: https://t.co/OYfjs4jdDp
📅 Dedicated mentorship session during the workshop + networking during coffee & lunch breaks!
Attending in person? Join us!
🔹Mentor sign-up: fill out this form by March 14th, 23:59 AoE: https://t.co/wY3rTvmNqL
🔹Mentee sign-up: TBA — stay tuned! 👀
🚨 Exciting Mentorship Opportunity at ICLR '25 🚨
We’re organizing a mentorship program during our Sparse-LLM workshop @iclr_conf to connect young researchers with senior mentors! This is a great chance to seek guidance on research challenges, collaborations, publishing and more.