Excited to share our @iclr_2024 spotlight paper. Our work shows that critical learning periods exist in a minimal analytically tractable model of artificial deep networks (deep linear networks) trained with SGD.
Paper: https://t.co/Qb6TyfiWEZ
Work w/ A. Achille & S. Soatto
#CVPR2026 is around the corner and we're excited to share Gated KalmanNet: A Fading Memory Layer through Test-Time Ridge Regression. Looking forward to meeting everyone who wants to learn more.
Gated KalmaNet (GKA, pronounced "gee-ka") generalizes Mamba-2 and Gated DeltaNet, and outperforms both under identical training conditions. It also works beyond language: swapping the Mamba layer in MambaVision for GKA improves ImageNet accuracy with no vision-specific tuning.
1/4
Introducing Priming
Hybrid models are faster and cheaper than Transformers to scale. But developing alternative architectures from scratch requires expensive pre-training runs.
Priming solves this by leveraging pre-trained Transformer weights to train equally performant Hybrid models with 2× faster throughput. Builders can now iterate on Hybrid architectures for under 150B tokens, 100× cheaper than pre-training.
1/12
Excited to share the first paper of my PhD:
Towards a theory of learning dynamics in deep state space models https://t.co/OMX0yTDlJw
with @jimmysmith1919, @MichaelKleinman, @dan_biderman, and @scott_linderman.
Accepted as a Spotlight talk at the NGSM workshop at ICML 2024!
1/5 Excited to finally share our new paper (led by @lndriscoll, now a group leader at the Allen!) in @NatureNeuro on modular computation in neural networks! We've explored how artificial recurrent networks handle multiple tasks, offering insights into flexible computation.
#tweeprint
https://t.co/Yur7HxwM4U
What do LLMs map to in the brain? In some datasets, not much. We emphasize the need for simple controls when analyzing the neural predictivity of trained and untrained LLMs.
https://t.co/R8aHBwlmfC
In collaboration w/ @ebrahim_feghhi, supervised by @IbanDlank and @JonathanCKao
From a neuroscience perspective our analysis provides an alternative explanation of critical periods that does not hinge on biochemical changes in plasticity, but is rather fundamental to a dynamical learning process.
Paper: https://t.co/Qb6TyfiWEZ
Code: https://t.co/y5gDD9kasN
Excited to share our @iclr_2024 spotlight paper. Our work shows that critical learning periods exist in a minimal analytically tractable model of artificial deep networks (deep linear networks) trained with SGD.
Paper: https://t.co/Qb6TyfiWEZ
Work w/ A. Achille & S. Soatto
Overall, our analysis shows that critical periods in deep networks depend primarily on two main factors: the depth of the model and the structure of the data distribution.
I'm going to be presenting our work on defining a notion of "usable information" and using it to study how optimal representations emerge during NN training! Today at 5pm PDT at ICLR 2021. https://t.co/rVsmVv4MLR. W/ Alessandro Achille, Daksh Idnani, @JonathanCKao
Check out our new preprint on using multi-area recurrent neural networks to better understand decision-making. This is joint work with first author @MichaelKleinman and co-senior author @ChandMuse. https://t.co/OuN3u6dvcG