[1/5] 🚨 How do you prove that an #LLM company actually deleted your private or copyrighted data upon your request? 🕵️♂️
Spoiler: right now, you really can't! 😱 Existing LLM unlearning metrics are not applicable in the real world. 😬
Thrilled to present WaterDrum (#ICLR2026)! We use text #watermarks to finally measure what an LLM has truly "forgotten."💧🥁
This work was co-led by @lululu0082@xinyuan3142@greglau in collaboration with @nhungbui1299@RachaelSim2 John Russell Himawan, Fanyu Wen, Chuan-Sheng Foo, See-Kiong Ng, @bryanklow.
📄 Paper:
https://t.co/jo08kPyMQT
Catch our poster @iclr_conf on 23 Apr 10:30AM Pavilion 4 P4-#4210! 🇧🇷
See the thread below 🧵👇
[1/4] You want to fine-tune an #LLM.
1️⃣ What's the best data mixture?
2️⃣ What LoRA config. to use?
The answers to the above depend on each other. 🐔🥚
Our team (with @ZhiliangChen94@alfredleongwl@ShaoYongOng@apivich_h@greglau et al.) calls this the Chicken-and-Egg Dilemma of LLM fine-tuning — and we just solved it.
📄Paper: https://t.co/jMqXvVMbYc
📍Poster: @iclr_conf #ICLR2026 DATA-FM Workshop, 26 Apr, Riocentro Room 203 A+B
[1/3] 🤔An interesting and practical question:
How can we find the optimal #LLM training data mixture that maximizes a free-form downstream task metric?
For instance, what fine-tuning data mixture should we use to maximize same-demographic user ratings ⭐ across our chatbots?
Our #ICLR2026 work (with @ZhiliangChen94@greglau Chuan-Sheng Foo) called DUET interleaves #BayesianOptimization and #DataSelection to automatically discover the best data mixture that maximizes any free-form downstream feedback, without manually searching through countless combinations.
📄Paper: https://t.co/2fS7elnb0W
📅Catch us at @iclr_conf 🇧🇷Poster Session 3 Fri Apr 24 10:30AM Pavilion 3 P3-#305.
More below👇.
When a company claims that your personal data has been removed from their model, have you ever wondered whether they've indeed done so? 🤔
If a new paper on arXiv claims that its proposed #MachineUnlearning algorithm can unlearn your personal data from an #LLM, how do we know if it can indeed do so?
🛢️WaterDrum is the first data-centric LLM unlearning metric based on watermarking that is calibrated, requires no retraining, works for blackbox models and when forget/retain sets have similar data.
Let the (Water)Drums roll at Rio! @iclr_conf #ICLR2026
Given a single model, how do we improve an #LLM’s reasoning performance with limited resources 💻 and inference time ⌛️? Can a smaller 1.5B model outperform a 7B model without incurring long inference time from sequential queries?
In the work of @_Hu_Wenyang@greglau et al., we introduce the framework called Dipper to create #LLMs ensembles from an optimized set of diverse reasoning prompts to improve performance. Dipper runs queries in parallel with a prompt optimization method inspired by Determinantal Point Processes (DPP), making it super fast ⏩ and effective. Furthermore, Dipper can work with LLM APIs without model access 📦!
With Dipper, we demonstrated how a small ensemble of just three 1.5B models can outperform a 7B model on a range of math and non-math reasoning tasks, while taking almost the same inference time and just < 3x compute for a normal query thanks to accelerated batch inference methods 😱 !
Find out more at @emnlpmeeting #EMNLP2025 (Poster 4492) at Hall C, Session 11, Nov 6 at 16:30!
Paper: https://t.co/AMPwoCCjFj
When a company claims that your personal data has been removed from their model, have you ever wondered whether they've indeed done so? 🤔
If a new paper on arXiv claims that its proposed #MachineUnlearning algorithm can unlearn your personal data from an #LLM, how do we know if it can indeed do so?
🛢️WaterDrum is the first data-centric LLM unlearning metric based on watermarking that is calibrated, requires no retraining, works for blackbox models and when forget/retain sets have similar data.
Find out more about Waterdrum from @greglau at our poster today (18 July) in the #ICML2025 Machine Unlearning for #GenerativeAI (MUGen @icmlconf) workshop during 3-3.45pm in West Meeting Room 202-204!
Joint work with @lululu0082 xinyuan @greglau nhung @RachaelSim2 fanyu chuansheng seekiong.
🤔How can we reliably use #Multimodal Large Language Models (#MLLMs) in practical settings? With multiple modalities come more challenges in managing uncertainty and mitigating errors where responses may seem plausible but are incorrect 😱.
Introducing ⚖️UMPIRE, an #UncertaintyQuantification framework for MLLMs that estimates task instance uncertainties at inference-time without external tools or additional training. Inspired by #DeterminantalPointProcess (#DPP), the UMPIRE metric is based on a global measure of the semantic volume formed by sampled responses, adjusted by local measures of each sample’s coherence with the multimodal query.
Find out more about UMPIRE from @greglau at poster tomorrow (19 July) in the @icmlconf #ICML2025 #R2-FM Reliable and Responsible Foundation Models workshop during 12-1pm and 4-5pm in West Ballroom C!
Joint work with @greglau and hieu.
Paper: https://t.co/XTHL55CYA0
Obtaining 🔎 interpretable symbolic equations is critical for scientific understanding and decision making in high-stakes environment. In fact, many such real-world applications (e.g., interactive/adaptive experiments, real-time decision support) demand rapid results within ⏳seconds.
Hence, we ask❓: how can we rapidly discover accurate and parsimonious symbolic equations from data?
In 📖README: Rapid #EquationDiscovery using #MultimodalEncoders, we develop an efficient #Multimodal foundation model for rapid equation discovery with 3 key insights:
1. Use image plots rather than raw numerical data to efficiently extract key trends;
2. Make use of pre-trained image and text encoders to efficiently train foundation model for equation discovery;
3. Develop an query-efficient optimization approach for efficient inference (Grey Wolf Algorithm x #BayesianOptimization).
Find out more about README from @greglau at our poster today (18 July) in the @icmlconf #ICML2025 AI4MATH workshop during 10:50am-12.20pm in West Ballroom C!
Joint work with @greglau yueran zi-yu @apivich_h@ruthchewing.
https://t.co/Dkw3c6k0e4 (https://t.co/XBWReWZ8nA) will be @icmlconf#ICML2025: @ray_qiaorui@JingtanW@greglau@_Hu_Wenyang@ruthchewing!
More details on each work later.
Interested to apply for a faculty position or Ph.D. in AI at @NUSComputing , ping me on Whova to meet up.
See you soon!
#DataSelection #DataCentricAI #LLMs #LLM #FederatedLearning #BayesianOptimization #MachineUnlearning #RLHF #SymbolicRegression
Excited to be visiting @UniofOxford at @OxfordStats@OxCSML over the summer, hosted by @desirivanova! Just arrived, and already having many interesting discussions. Looking forward to meeting people, exchanging ideas and engaging in research collaborations!
Introducing WaterDrum🛢️, the first data-centric #LLM#unlearning metric that leverages robust text #watermarking💧to provide an effective, practical, and resilient way to evaluate LLM unlearning performance😉! (1/n)
#MachineUnlearning#LLMs
🧪 A key challenge to solving #InverseProblems, especially in science and engineering settings, is limited data and compute for simulations.
🥧 The @iclr_conf work of @apivich_h@greglau , PIED, is an #ExperimentalDesign framework for #PDE inverse problems that efficiently optimizes for the best observation inputs.
📌 PIED utilizes #PINNs to perform forward simulation and to solve the inverse problems, which allows for higher efficiency over numerical simulators and allow for meshless and differentiable simulations.
🤖 Using PINNs along with differentiable observation selection criteria, PIED selects the optimal design parameters for one-shot deployment, while exploitation parallel computation and gradient-based optimization methods.
📈 These benefits allow PIED to outperform existing ED benchmarks in IPs for both finite-dimensional and function-values inverse parameters, as demonstrated in various settings including on real experimental data.
Check out our #ICLR2025 Poster #28 on 26 Apr in Poster Session 6 Sat 3pm Hall 3 + Hall 2B.
Paper: https://t.co/9IMhSEpnp9
GitHub: https://t.co/Qa4ivyY4i0
@NeurIPSConf We showed how a small ensemble of 3 1.5B models can outperform a 7B model on MATH with almost the same inference time & <3x compute for a normal query due to accelerated batch inference methods 😱 !
C u @NeurIPSConf#NeurIPS2024 Workshop on Foundation Model Interventions (n/n)
@NeurIPSConf Introducing Dipper: a framework to create #LLMs ensembles from an optimized set of diverse reasoning prompts to improve perf.
Unlike sequential inference time methods, Dipper runs queries in parallel, making it super fast ⏩ and effective. (2/n)
@NeurIPSConf#NeurIPS2024
Given a single model, how do we improve an #LLM’s reasoning performance with limited resources 💻 and inference time⌛️? Can a smaller 1.5B model outperform a 7B model without incurring long inference time from sequential queries? (1/n)
@NeurIPSConf#NeurIPS2024#LLMs