At #ICML2026, @XXiaoyu44761 and I will present "Unlearning Isn't Deletion," (https://t.co/WsEUNVoW8d) on reversibility risks in LLM unlearning.
Xiaoyu is also at #ACL2026, presenting BiForget: automated forget-set generation from seed queries.
Around Seoul/San Diego? Say hi!
🚀 Excited to present our EMNLP 2025 work "OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models"!
I'll be at my poster session on Wednesday, Hall C3, 11:00-12:30 Session 2.
Come chat if you're interested in LLM unlearning, privacy! Check out our paper: https://t.co/yj1K4ogZGq
#EMNLP2025 #LLM #Unlearning #Privacy
@xwang_lk from Usenix Sec'25 PC chairs' message, "the extremes are concerning: six individuals appear as co-authors on 20 or more submissions, with two authors appearing on 36 and 39 submissions respectively. At such volume, it becomes difficult not to question the nature and depth.
People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true?
In our study (https://t.co/UhfB0ByLqi), we evaluated over 20 open-weight reasoning models and found that:
➡️Only models trained with RL exhibit broad transfer of math reasoning skills to other tasks.
➡️Models trained with SFT show limited or no transfer—especially to non-reasoning domains.
To quantify this, we introduce the Transferability Index (TI), which measures how much gain in math could transfer to others. A positive score indicates effective transfer; a negative one suggests loss of general capability.
We evaluate the models on three benchmark categories:
- Math reasoning: MATH-500, AIME24/25, Olympiad
- Other reasoning: GPQA-D (Science), LiveCodeBench2 (Code), ACPBench (Agent Planning), HeadQA (Medical)
- Non-reasoning: CoQA (Conversational QA), IFEval (Instruction Following), HalluEval (Hallucination), MC-TACO (Commonsense)
Our findings challenge the blind pursuit of leaderboard performance in math reasoning via SFT. Simply creating more math-like SFT data may inadvertently harm a model’s broader generalization. Instead, RL appears to be key for truly transferable reasoning development.
@niloofar_mire A simple Q:
It's a bit "weird" that the final step works on a weaker, much smaller LM; why not let a weaker LM output a "plan" as an input to a stronger LM?
Join the ACM CCS 2025 Artifact Evaluation Committee and play a vital role in advancing high-quality research artifacts and fostering reproducibility in security research. Nominate yourself or a colleague today using this form: https://t.co/9WHE3HJb9e.
@acm_ccs
🌟 ML Security & Privacy Researchers: I'm seeking exceptional Visiting Students, Postdoctoral Fellows, or Research Associates to work in our group at MBZUAI in Abu Dhabi.
Interested? You can find more information here: https://t.co/vV1TOqEc2i