@AlexanderSpangh@jchudnov yes! If you take a look at Fig 2 that's exactly what it shows. The longer you train the model, the more the gradients induced by semantically identical documents align.
Here's hoping for better luck at ICLR 2026!
https://t.co/rm4izHEOnV
If you want to read the paper without R7Hk's endorsement: https://t.co/NmJAOnCq72
@DjDvij also made a colab where you can try the attack out for yourself: https://t.co/7U6u9lOM5D
So exuberant to announce that our paper "No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms" has been rejected from NeurIPS 2025 with an average score of 4! 💪🔥🔥💯
@DjDvij@RylanSchaeffer@sanmikoyejo@ChrisCundy@AbhayPuri98
3. Writing the majority of your review using a language model. It did such a great job!
Thanks also to the AC for ignoring us when we reported this review for violating the @NeurIPSConf guidelines against LM reviewing.
@casper_hansen_@RylanSchaeffer There's no contradiction. We don't claim that min-p is better or worse than other logit processors-- we contend only that the evidence in Minh et. al. does not meet scientific standards to claim superiority.
A bit late to the party, but our paper on predictable inference-time / test-time scaling was accepted to #icml2025 🎉🎉🎉
TLDR: Best of N was shown to exhibit power (polynomial) law scaling (left), but maths suggest one should expect exponential scaling (center). We show how to ...
1/3
🚨New preprint 🚨
Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models
We examine min-p sampling (ICLR 2025 oral) & find significant problems in all 4 lines of evidence: human eval, NLP evals, LLM-as-judge evals, community adoption claims
1/8
🚨 New Paper 🚨
An Overview of Large Language Models for Statisticians
📝: https://t.co/oklTYEAMvH
- Dual perspectives on Statistics ➕ LLMs: Stat for LLM & LLM for Stat
- Stat for LLM: How statistical methods can improve LLM uncertainty quantification, interpretability, trustworthiness & more.
- LLM for Stat: How LLMs can enhance statistical workflows: from data collection, synthesis, annotation to statistical modeling, with applications to medical research
Presents key LLM advances: Architecture, Training, Reasoning, and Self-Alignment:
(1) 🧠Evolution of LLM architectures with Transformers and Self-Attention
(2) LLM training pipeline from pre-training, SFT, to RLHF and Preference Optimization.
(3) 💭 System 2 Prompting and Chain-of-Thought for test-time scaling .
(4) 🚀 LLM Self-Alignment for achieving super-human intelligence
Statisticians play a key role in the development of large-scale AI models:
(1) 💡 Statistical insights improve LLM uncertainty quantification & interpretability
(2) 🤖 Watermarking for AI-generated content detection
(3) ⚖️ Privacy & algorithmic fairness to ensure responsible AI adoption
LLMs can also empower statistical science by:
(1) 📈 Scaling up data collection, synthesis, and annotation.
(2) 🖥️ Automating statistical coding & exploratory analysis
(3) 🔬 Facilitating medical research
By bridging statistics & AI, we can:
✅ Improve better LLMs with statistical methodologies.
✅ Leverage LLMs for statistical applications in high-stakes domains
(1/n) Fine tuning APIs create significant security vulnerabilities, breaking alignment in frontier models for under $100!
Introducing NOICE, a fine-tuning attack that requires just 1000 training examples to remove model safeguards. The strangest part: we use ONLY harmless data.