🧵 New preprint! "Towards Understanding Steering Strength" with @MagamedTm and D. Garreau
Activation steering is a popular way to control LLM behavior at inference. But how much should you steer? We provide the first theoretical analysis of the steering strength α.
1/7
TL;DR: Steering strength α has precise, predictable effects on LLM outputs. Understanding these laws may help design better steering strategies.
Paper: https://t.co/qwZAHqoN6L
Code: https://t.co/W0IbUQ7AZe
7/7
🧵 New preprint! "Towards Understanding Steering Strength" with @MagamedTm and D. Garreau
Activation steering is a popular way to control LLM behavior at inference. But how much should you steer? We provide the first theoretical analysis of the steering strength α.
1/7
@SebAaltonen@IceSolst This is the same shit for every craft out there:
Oh look 500 hundreds songs a day
Oh look 1000 video game assets a day
Oh look 15k LOC a day
Oh look 50 videos post-processed a day
@chanwoopark20 Maybe of interest to you: we derive a *local* Lipschitz constant of the softmax in this paper https://t.co/Vdu39cQwDh (Lemma H.6). It is local, but gives more information than the 1/2 that can be quite pessimistic for small perturbations.
Pour s'inscrire et pour plus d'informations : https://t.co/YXELvdEMIP
Les orateurs et oratrices pléniers sont Pierre Ablin, Yann Brenier, Julie Delon, Stéphane Gaubert, Francisco J. Silva Álvarez (prix J.J Moreau) et Irène Waldspurger.
Attention, le nombre de place est limité !
Les inscriptions aux journées MODE 2026 à Nice sont désormais ouvertes. Elles se dérouleront du 18 au 20 mars à l'Hôtel Saint-Paul.
Les inscriptions sont ouvertes jusqu'au 1 mars (majoration > 9/02). La deadline pour soumettre une communication est le **15 janvier**.
Happy to be at #NeurIPS2025 in San Diego to present our poster ‘Learning Theory for Kernel Bilevel Optimization’ #3005, Fri at 4:30 p.m. Stop by/ping me to chat, especially about statistics, causality, generative models! Let's connect!
Joint w/ E. Pauwels, @vaiter, @MichaelArbel