Just published my first ever blog post about @ianmgriffith@JoshHMcDermott and I's recent paper on the cocktail party problem and brain-inspired auditory attention models.
Would love any feedback if you check it out!
(link in replies)
What happens when a former MIT football captain who majored in computation and cognition + minored in design and music comes to McGovern @mitbrainandcog for his PhD? He builds a speaker room that solves the "cocktail party" problem, of course. 💥
We never really knew how to train nonlinear RNNs well… BPTT struggled with vanishing grads (no long-range memory) and sequential rollout (hard to parallelizable).
What if instead an oracle told us the optimal memory state m_t at each step? Then the RNN could do one-step supervised learning on (m_t, x_{t+1}) → m_{t+1} labels.
We call this Supervised Memory Training (SMT): a replacement for BPTT that trains RNNs without unrolling them. SMT is time-parallelizable and solves vanishing gradients.
Website: https://t.co/BvctWJlPad
arXiv: https://t.co/5xR0mUVymp
"The Truth Lies Somewhere in the Middle (of the Generated Tokens)"
In autoregressive language models, mean pooling hidden states across generation yields better representations than any token alone.
project page: https://t.co/kXddYUir4k
w/ @phillip_isola and @thisismyhat
Just published my first ever blog post about @ianmgriffith@JoshHMcDermott and I's recent paper on the cocktail party problem and brain-inspired auditory attention models.
Would love any feedback if you check it out!
(link in replies)
Can a simple architectural bias produce human-like selective attention?
We trained a network with multiplicative feature gains from an attention cue and found it reproduces many cocktail party effects, and even predicts new ones!
Loved on working on this with @ianmgriffith
Excited to announce a new paper from our lab, by Ian Griffith @ianmgriffith with help from Preston Hess @phess2, introducing a model of attentional selection. https://t.co/zdquVpZjNE
@mitbcs@ScienceMIT@mcgovernmit@SHBTHarvard
Here is a summary. (1/n)
New pre-print from our lab, by Lakshmi Govindarajan @lakshming92 with help from Sagarika Alavilli, introducing a new type of model for studying sensory uncertainty. https://t.co/TMKEDbmbCm
Here is a summary. (1/n)
Our next paper on comparing dynamical systems (with special interest to artificial and biological neural networks) is out!! Joint work with @AnnHuang42 , as well as @tweetsatpreet , @Leokoz8 , @FieteGroup , and @KanakaRajanPhD : https://t.co/al1UrSv13e
🧵 LoRA vs full fine-tuning: same performance ≠ same solution.
Our NeurIPS ‘25 paper 🎉shows that LoRA and full fine-tuning, even when equally well fit, learn structurally different solutions and that LoRA forgets less and can be made even better (lesser forgetting) by a simple intervention!
Read on for behavioral differences (forgetting, continual learning) and other analysis!
Paper: https://t.co/XXyQn7uYmZ
(1/7)
Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis.
Today I want to share two new works on this topic:
Eliciting higher alignment: https://t.co/KY4fjNeCBd
Unpaired rep learning: https://t.co/vJTMoyJj5J
1/9
[1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.
As of this Wednesday I am officially signed! I’m so blessed to have the support and love of everyone who helped me along the way. I want to especially thank my Coaches, Family, and Teammates. Go Engineers! @LAFootballAC@Coach_Brennan@bbubna@MITFootball