Sidak Pal Singh

@unregularized

Research Scientist at Google DeepMind, working on Gemini. (prev. PhD at ETH Zürich & MPI-IS Tübingen.) No second-hand opinions. They are absolutely my own ;)

New York

Joined October 2022

106 Following

486 Followers

51 Posts

Pinned Tweet

Sidak Pal Singh @unregularized

almost 2 years ago

📢I'll be presenting two posters, at #ICML2024 HiLD workshop (Straus 2) today (assuming no further ✈️ delays): - Closed form of the Hessian spectrum for some neural networks https://t.co/qQTWOZUsQw - Landscaping Linear Mode Connectivity https://t.co/w09UyJ2Wyf

unregularized's tweet photo. 📢I'll be presenting two posters, at #ICML2024 HiLD workshop (Straus 2) today (assuming no further ✈️ delays):
- Closed form of the Hessian spectrum for some neural networks https://t.co/qQTWOZUsQw
- Landscaping Linear Mode Connectivity https://t.co/w09UyJ2Wyf https://t.co/P7bvxw2M7D

1

12

1

7

4K

Sidak Pal Singh @unregularized

7 months ago

quite good actually. coding’s so smooth! https://t.co/gprJA4ODgh

Google DeepMind @GoogleDeepMind

7 months ago

This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵

211

6K

1K

1K

2M

0

0

0

0

188

Sidak Pal Singh @unregularized

10 months ago

sure the numbers are great, but screw that! it’s your phd, try wild and original ideas even if you fail once or twice at least. https://t.co/Y1XisXjGXd

Gabriele Berton

11 months ago

A few numbers from my PhD: 8 first-author top-conference (CVPR/ICCV/ECCV) papers 100% acceptance rate per paper 80% acceptance rate per submission 1 invited long talk at CVPR tutorial 5 top-conf demos (acceptance rate 100% vs ~30% average) ~2k GitHub stars

20

851

27

263

355K

0

9

0

1

923

Sidak Pal Singh @unregularized

11 months ago

when you go beyond linear mode connectivity, interesting things happen 😮👇 https://t.co/hxuaI94OiD

Alexander Theus @TheusResearch

11 months ago

1/ 🚨 New paper alert! 🚨 We explore a key question in deep learning: Can independently trained Transformers be linearly connected in weight space — without a loss barrier? Yes — if you uncover their rich symmetries. 📄 arXiv: https://t.co/wVoLYNzk0m

TheusResearch's tweet photo. 1/ 🚨 New paper alert! 🚨
We explore a key question in deep learning:
Can independently trained Transformers be linearly connected in weight space — without a loss barrier?
Yes — if you uncover their rich symmetries.
📄 arXiv: https://t.co/wVoLYNzk0m https://t.co/W9WWYTZqig

2

59

8

29

6K

1

6

0

6

614

Who to follow

Senior Research Scientist, Google @Google; Associate Professor, University of Pennsylvania @PENN; Machine Learning, Information Theory

Verified account

Associate Professor at University of Bristol. https://t.co/XkbZOvyLcl

Guillermo Ortiz-Jiménez

Research Scientist at @GoogleDeepMind. Past: PhD at EPFL, intern at Google, ELLIS student Oxford.

Sidak Pal Singh @unregularized

12 months ago

Belated life update: 🎓 PhD — done 🔬 Joined Google in NYC 🗽as a Research Scientist ♊️ Gemini: now more than just my star sign :)

25

548

11

53

29K

Sidak Pal Singh @unregularized

about 1 year ago

🚀 TOMORROW afternoon at ICLR: Learn about the directionality of optimization trajectories in neural nets and how it inspires a potential way to make LLM pretraining more efficient ♻️ (Poster# 585, hall 2b)

Sidak Pal Singh @unregularized

almost 2 years ago

Ever wondered how the optimization trajectories are like when training neural nets & LLMs🤔? Do they contain a lot of twists 💃 and turns, or does the direction largely remain the same🛣️? We explore this in our work for LLMs (upto 12B params) + ResNets on ImageNet. Key findings👇

unregularized's tweet photo. Ever wondered how the optimization trajectories are like when training neural nets & LLMs🤔? Do they contain a lot of twists 💃 and turns, or does the direction largely remain the same🛣️? We explore this in our work for LLMs (upto 12B params) + ResNets on ImageNet.
Key findings👇 https://t.co/fRUn432CUO

2

63

10

38

10K

0

6

1

2

2K

Sidak Pal Singh @unregularized

about 1 year ago

Don't miss out our spotlight ✨paper at ICLR 🇸🇬 about the loss landscape of Transformers and their special heterogeneous structure, done together with great collaborators! https://t.co/VyQiHGM7CE

Weronika Ormaniec @wormaniec

about 1 year ago

Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique? With @unregularized & @f_dangel, we explore this via the Hessian in our #ICLR2025 spotlight paper! Key insights👇 1/8

wormaniec's tweet photo. Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique?

With @unregularized & @f_dangel, we explore this via the Hessian in our #ICLR2025 spotlight paper!

Key insights👇 1/8 https://t.co/0Hb06w9Kfb

1

26

8

10

3K

0

16

2

4

1K

Sidak Pal Singh @unregularized

over 1 year ago

@savvyRL :) I think the email address used there seems to suggest somebody doing it for him.. but you never know haha

1

2

1

0

1K

unregularized retweeted

Alice Bizeul @AliceBizeul

over 1 year ago

✨New Preprint ✨ Ever thought that reconstructing masked pixels for image representation learning seems sub-optimal? In our new preprint, we show how masking principal components—rather than raw pixel patches— improves Masked Image Modelling (MIM). Find out more below 🧵

AliceBizeul's tweet photo. ✨New Preprint ✨ Ever thought that reconstructing masked pixels for image representation learning seems sub-optimal?

In our new preprint, we show how masking principal components—rather than raw pixel patches— improves Masked Image Modelling (MIM).

Find out more below 🧵 https://t.co/Qt4H74NaRu

17

525

61

327

48K

Sidak Pal Singh @unregularized

over 1 year ago

@mayank98shri @TheGradient @ynd @baharanm The bulk + outliers notion isn't wrong. The key is to understand how sharpness reduction is happening. GN can lead to spurious sharpness reduction; while NME reduces sharpness through adapting the geometry of the model itself. In vision, these ways are balanced, but not for LLMs.

0

2

0

0

48

unregularized retweeted

Yann N. Dauphin @ynd

over 1 year ago

Don’t miss our poster shedding more light on sharpness regularization at NeurIPS tomorrow https://t.co/GtB9E2VeCG

0

6

4

4

3K

Sidak Pal Singh @unregularized

over 1 year ago

@agarwl_ @pratyushmaini It would be interesting to see the rank these models get haha. Predictions? :)

0

0

0

0

48

Sidak Pal Singh @unregularized

over 1 year ago

Reinventing things has a bad rep in today's age. But is it really that bad? Maybe it's something to be even cultivated, like selectively? The second post in this series of blogs is now out. Let's have a deeper look at this overused trope! https://t.co/BIlzZKepuU

unregularized's tweet photo. Reinventing things has a bad rep in today's age. But is it really that bad? Maybe it's something to be even cultivated, like selectively?

The second post in this series of blogs is now out. Let's have a deeper look at this overused trope!

https://t.co/BIlzZKepuU https://t.co/veA1soTa7I

0

2

0

0

310

Sidak Pal Singh @unregularized

over 1 year ago

I’m exploring a new form of writing—threads of human curiosity woven through the circuits of AI, crafting reflections that are, in the end, fully machine-generated, yet in a way profoundly human. https://t.co/zJgGaePdDL

0

0

0

0

214

Sidak Pal Singh @unregularized

over 1 year ago

Come, let's scale up the building one floor, And, layer up the neural networks once more. Soon our buildings will touch the sky, And, our computers will bear AGI. A quaint little hut in the mountains is out of fashion, Satisfaction has no gradients for backpropagation. ~Fitoor

0

5

0

0

309

Sidak Pal Singh @unregularized

over 1 year ago

@kellerjordan0 @bozavlado QK params and V params have very different behavior in terms of their curvature. So grouping them together is not ideal. I believe you could still try preconditioning QK params together, and keeping V separate.

1

1

0

0

53

Sidak Pal Singh @unregularized

over 1 year ago

At this paper count, recalling all the paper names would already be a big feat :) https://t.co/7wlFCvOYzf

Peter Richtarik

@peter_richtarik

over 1 year ago

Source: https://t.co/4KbFaflENO

peter_richtarik's tweet photo. Source: https://t.co/4KbFaflENO https://t.co/wRZuLnRsTH

16

145

20

109

378K

0

2

0

0

353

Sidak Pal Singh @unregularized

over 1 year ago

“Hypotheses are nets: only he who casts will catch.” - Novalis

unregularized's tweet photo. “Hypotheses are nets: only he who casts will catch.” - Novalis https://t.co/VVzEx07bVL

0

4

0

1

332

Sidak Pal Singh @unregularized

almost 2 years ago

At last some attempts to change the status quo: authors with three or more papers are obligated to review for ICLR https://t.co/wp3yVNCfMo

Preetum Nakkiran @PreetumNakkiran

almost 2 years ago

Review requirements! (And 10pg limit!)

PreetumNakkiran's tweet photo. Review requirements! (And 10pg limit!) https://t.co/FuRKfPfK8J

3

60

1

7

20K

1

6

0

0

606

Sidak Pal Singh @unregularized

almost 2 years ago

Come to our posters today at 3:30 pm (Straus 2) to know more! :)

0

0

0

0

159

Sidak Pal Singh @unregularized

almost 2 years ago

📢I'll be presenting two posters, at #ICML2024 HiLD workshop (Straus 2) today (assuming no further ✈️ delays): - Closed form of the Hessian spectrum for some neural networks https://t.co/qQTWOZUsQw - Landscaping Linear Mode Connectivity https://t.co/w09UyJ2Wyf

unregularized's tweet photo. 📢I'll be presenting two posters, at #ICML2024 HiLD workshop (Straus 2) today (assuming no further ✈️ delays):
- Closed form of the Hessian spectrum for some neural networks https://t.co/qQTWOZUsQw
- Landscaping Linear Mode Connectivity https://t.co/w09UyJ2Wyf https://t.co/P7bvxw2M7D

1

12

1

7

4K

Sidak Pal Singh @unregularized

almost 2 years ago

Poster 1: Sharpness/Flatness are much talked about: better minima, Sharpness aware minimization, Edge-of-Stability, and so on. But what really is sharpness? What exactly does it quantify, besides the surface-level definition? How are the eigenvalues and eigenvectors really like?

2

0

0

0

279

Sidak Pal Singh @unregularized

almost 2 years ago

Poster 2: Linear Mode Connectivity (LMC) is yet another popular feature of neural loss landscapes. But how does LMC arise in the first place? How should the landscape be structured to allow LMC? Are barriers present just at the end, or do they start much early?

0

1

0

0

230

Last Seen Users on Sotwe

Trends for you

Most Popular Users