Preston Hess

@phess002

BCS PhD @ MIT | Auditory perception + deep learning | Turning noise into signal

Cambridge, MA

Joined April 2018

286 Following

244 Followers

22 Posts

Pinned Tweet

Preston Hess @phess002

about 2 months ago

Just published my first ever blog post about @ianmgriffith @JoshHMcDermott and I's recent paper on the cocktail party problem and brain-inspired auditory attention models. Would love any feedback if you check it out! (link in replies)

352

phess002 retweeted

McGovern Institute

@mcgovernmit

5 days ago

What happens when a former MIT football captain who majored in computation and cognition + minored in design and music comes to McGovern @mitbrainandcog for his PhD? He builds a speaker room that solves the "cocktail party" problem, of course. 💥

mcgovernmit's tweet photo. What happens when a former MIT football captain who majored in computation and cognition + minored in design and music comes to McGovern @mitbrainandcog for his PhD? He builds a speaker room that solves the "cocktail party" problem, of course. 💥 https://t.co/uvSWxkHyYD

phess002 retweeted

Akarsh Kumar

@akarshkumar0101

14 days ago

We never really knew how to train nonlinear RNNs well… BPTT struggled with vanishing grads (no long-range memory) and sequential rollout (hard to parallelizable). What if instead an oracle told us the optimal memory state m_t at each step? Then the RNN could do one-step supervised learning on (m_t, x_{t+1}) → m_{t+1} labels. We call this Supervised Memory Training (SMT): a replacement for BPTT that trains RNNs without unrolling them. SMT is time-parallelizable and solves vanishing gradients. Website: https://t.co/BvctWJlPad arXiv: https://t.co/5xR0mUVymp

$akarshkumar0101's tweet photo. We never really knew how to train nonlinear RNNs well… BPTT struggled with vanishing grads (no long-range memory) and sequential rollout (hard to parallelizable). What if instead an oracle told us the optimal memory state m_t at each step? Then the RNN could do one-step supervised learning on (m_t, x_{t+1}) → m_{t+1} labels. We call this Supervised Memory Training (SMT): a replacement for BPTT that trains RNNs without unrolling them. SMT is time-parallelizable and solves vanishing gradients. Website: https://t.co/BvctWJlPad arXiv: https://t.co/5xR0mUVymp$

791

120

668

177K

phess002 retweeted

Sophie Wang @SophieLWang

about 1 month ago

"The Truth Lies Somewhere in the Middle (of the Generated Tokens)" In autoregressive language models, mean pooling hidden states across generation yields better representations than any token alone. project page: https://t.co/kXddYUir4k w/ @phillip_isola and @thisismyhat

472

382

50K

Who to follow

Luke Desherow

@deshduke45

Super Senior | D Linemen/LS @ Roosevelt University | Lindenwood/GVSU Alum

Rory Boos

@RoryBoos

Colgate ‘23 | Loyola Academy ’19 | 2018 8A State Champion

Michael Mosher

@mmosher_15

Preston Hess @phess002

about 2 months ago

@ianmgriffith @JoshHMcDermott https://t.co/bvA1nQOUX6

100

Preston Hess @phess002

about 2 months ago

352

Preston Hess @phess002

3 months ago

Can a simple architectural bias produce human-like selective attention? We trained a network with multiplicative feature gains from an attention cue and found it reproduces many cocktail party effects, and even predicts new ones! Loved on working on this with @ianmgriffith

Josh McDermott @JoshHMcDermott

3 months ago

Excited to announce a new paper from our lab, by Ian Griffith @ianmgriffith with help from Preston Hess @phess2, introducing a model of attentional selection. https://t.co/zdquVpZjNE @mitbcs @ScienceMIT @mcgovernmit @SHBTHarvard Here is a summary. (1/n)

phess002 retweeted

Josh McDermott @JoshHMcDermott

7 months ago

New pre-print from our lab, by Lakshmi Govindarajan @lakshming92 with help from Sagarika Alavilli, introducing a new type of model for studying sensory uncertainty. https://t.co/TMKEDbmbCm Here is a summary. (1/n)

phess002 retweeted

Mitchell Ostrow @neurostrow

7 months ago

Our next paper on comparing dynamical systems (with special interest to artificial and biological neural networks) is out!! Joint work with @AnnHuang42 , as well as @tweetsatpreet , @Leokoz8 , @FieteGroup , and @KanakaRajanPhD : https://t.co/al1UrSv13e

neurostrow's tweet photo. Our next paper on comparing dynamical systems (with special interest to artificial and biological neural networks) is out!! Joint work with @AnnHuang42 , as well as @tweetsatpreet , @Leokoz8 , @FieteGroup , and @KanakaRajanPhD : https://t.co/al1UrSv13e https://t.co/YZwZE8TIro

phess002 retweeted

Reece Shuttleworth

@ReeceShuttle

8 months ago

🧵 LoRA vs full fine-tuning: same performance ≠ same solution. Our NeurIPS ‘25 paper 🎉shows that LoRA and full fine-tuning, even when equally well fit, learn structurally different solutions and that LoRA forgets less and can be made even better (lesser forgetting) by a simple intervention! Read on for behavioral differences (forgetting, continual learning) and other analysis! Paper: https://t.co/XXyQn7uYmZ (1/7)

ReeceShuttle's tweet photo. 🧵 LoRA vs full fine-tuning: same performance ≠ same solution.

Our NeurIPS ‘25 paper 🎉shows that LoRA and full fine-tuning, even when equally well fit, learn structurally different solutions and that LoRA forgets less and can be made even better (lesser forgetting) by a simple intervention!

Read on for behavioral differences (forgetting, continual learning) and other analysis!

Paper: https://t.co/XXyQn7uYmZ
(1/7)

250

192K

phess002 retweeted

Phillip Isola @phillip_isola

8 months ago

Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis. Today I want to share two new works on this topic: Eliciting higher alignment: https://t.co/KY4fjNeCBd Unpaired rep learning: https://t.co/vJTMoyJj5J 1/9

692

119

488

68K

Preston Hess @phess002

11 months ago

Had a ton of fun working on this with @LakerNewhouse @leloykun @anzahorodnii @jxbz @phillip_isola. Check out the paper linked on the second tweet in the thread!

Laker Newhouse @LakerNewhouse

11 months ago

[1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.

LakerNewhouse's tweet photo. [1/9] We created a performant Lipschitz transformer by spectrally regulating the weights—without using activation stability tricks: no layer norm, QK norm, or logit softcapping. We think this may address a “root cause” of unstable training.

579

561

147K

702

Preston Hess @phess002

over 5 years ago

@natevanzelst @BadgerFootball @tmehlhaff10 @LAFootballAC @KohlsKicking @EFTfootball Huge congrats bro, you earned it

Preston Hess @phess002

almost 6 years ago

@BrandonSvets @CoachTimMurphy @Coach_Joel_Lamb @TheCoachHo @Coach_Johnson76 @HarvardFootball @LAGoRamblers See you in Boston bro 🔥🔥

Preston Hess @phess002

about 6 years ago

@CooperTamisiea @coachhatem @RoryMannering Congrats coop

Preston Hess @phess002

about 6 years ago

@jwthomas24 @coachhatem @RoryMannering love it. congrats qb1

Preston Hess @phess002

about 6 years ago

Let’s get it #RollTech

MIT Football @MITFootball

about 6 years ago

The Engineers welcome another Chicagoland baller to the party -@phess002 from @LoyolaAcademy! #RollTech🏆🏆🎉 @LAFootballAC @LAGoRamblers

Preston Hess @phess002

about 6 years ago

@CooperTamisiea @CoachNickDavis love it coop

Preston Hess @phess002

about 6 years ago

@willpujals3 Congrats puj 💪

Preston Hess @phess002

over 6 years ago

@_therealnimer Sorry Jack, I actually have an eFree right now. Didn’t mean to distract you

Preston Hess @phess002

over 6 years ago

As of this Wednesday I am officially signed! I’m so blessed to have the support and love of everyone who helped me along the way. I want to especially thank my Coaches, Family, and Teammates. Go Engineers! @LAFootballAC @Coach_Brennan @bbubna @MITFootball

Preston Hess

@phess002

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users