Usha Bhalla @ushabhalla_ - Twitter Profile

4 days ago

At CVPR this week for a talk on neural geometry of large vision models. If you’re interested in interpretability or joining @GoodfireAI, come say hi. 🤠

thomas_fel_'s tweet photo. At CVPR this week for a talk on neural geometry of large vision models. If you’re interested in interpretability or joining @GoodfireAI, come say hi. 🤠 https://t.co/guuZQGb3YQ

2

89

15

27

8K

ushabhalla_ retweeted

dron @_dron_h

17 days ago

anti sae sae club

1

41

3

3K

ushabhalla_ retweeted

Ekdeep Singh Lubana @EkdeepL

17 days ago

Super excited to have this paper finally out! So many nuggets here, but a critical highlight: you should *not* interpret SAE features in isolation. The population geometry is where it's all at! Similar to this image of us @GoodfireAI folks playing out the elephant parable. :P

EkdeepL's tweet photo. Super excited to have this paper finally out! So many nuggets here, but a critical highlight: you should *not* interpret SAE features in isolation. The population geometry is where it's all at! Similar to this image of us @GoodfireAI folks playing out the elephant parable. :P https://t.co/ZE1kikyJDD

2

141

14

41

8K

ushabhalla_ retweeted

Thomas Fel

@thomas_fel_

17 days ago

How do SAEs capture concept manifolds? 🍩 I think this is important work. we study how SAEs handle the geometric structures we've identified and find they tile/shatter them in a particular way we characterize, letting us recast unsupervised manifold discovery as inverse Ising

1

82

11

32

5K

Who to follow

Sweetspot (YC S23)

@SweetspotGov

AI for Government Contracting.

MS4 @NSUFlorida | Internal Medicine Candidate | Pangaea Global Corp | The Roze Garden

ushabhalla_ retweeted

Goodfire

@GoodfireAI

17 days ago

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

25

1K

150

765

170K

Usha Bhalla @ushabhalla_

23 days ago

us discovering that llms implement generalizable, implicit fourier calculators while we have to count out loud to verify the results 🙂‍↕️🙂‍↕️

Goodfire

@GoodfireAI

24 days ago

Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)

122

4K

556

3K

934K

2

55

4

10

4K

Usha Bhalla @ushabhalla_

about 1 month ago

---///🥱🥱 ~~~ooo😉😉 interp, now thinking outside of 1d @goodfire! stayed tuned for more from us soon :)

Goodfire

@GoodfireAI

about 1 month ago

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

307

11K

2K

9K

3M

1

52

2

8

3K

ushabhalla_ retweeted

Goodfire

@GoodfireAI

about 1 month ago

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

307

11K

2K

9K

3M

ushabhalla_ retweeted

Lee Sharkey

@leedsharkey

about 1 month ago

My team at @GoodfireAI has been cooking up a new way to do interpretability: decompose a language model’s weights, not its activations. Our decomposition natively handles attention (!) and behaves less like a lookup table and more like a generalizing algorithm. (1/6)

34

1K

192

1K

242K

ushabhalla_ retweeted

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞

@hima_lakkaraju

about 2 months ago

📣 Excited to announce our oral presentation at #ICLR! LLMs capture rich semantic structure, as evidenced by their strong performance across a wide range of language and reasoning tasks. But Sparse Autoencoders (SAEs), a popular interpretability tool, mostly learn local, noisy, token-level features when applied to LLMs (e.g., hundreds of features for the word “the”). So why aren’t SAEs finding that rich semantic structure? 👉 Because they ignore the sequential nature of language. We introduce Temporal SAEs to bridge this gap. https://t.co/HLvuAV7Qek 🧵 [1/N]

hima_lakkaraju's tweet photo. 📣 Excited to announce our oral presentation at #ICLR!

LLMs capture rich semantic structure, as evidenced by their strong performance across a wide range of language and reasoning tasks.

But Sparse Autoencoders (SAEs), a popular interpretability tool, mostly learn local, noisy, token-level features when applied to LLMs (e.g., hundreds of features for the word “the”).

So why aren’t SAEs finding that rich semantic structure?

👉 Because they ignore the sequential nature of language.

We introduce Temporal SAEs to bridge this gap.

https://t.co/HLvuAV7Qek

🧵 [1/N]

5

169

26

108

23K

ushabhalla_ retweeted

Goodfire

@GoodfireAI

about 2 months ago

Our research with Mayo Clinic was just covered in @TIME! “If there's some barrier like, ‘Is interpretability useful?’ I think we've been cracking it, and I think we've smashed through it” — @DanJBalsam

GoodfireAI's tweet photo. Our research with Mayo Clinic was just covered in @TIME!

“If there's some barrier like, ‘Is interpretability useful?’ I think we've been cracking it, and I think we've smashed through it” — @DanJBalsam https://t.co/hdZGBX9ggy

2

83

18

16

7K

ushabhalla_ retweeted

Goodfire

@GoodfireAI

about 2 months ago

We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)

GoodfireAI's tweet photo. We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic.

We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8) https://t.co/PTrRAqjDMA

10

885

172

581

221K

ushabhalla_ retweeted

Goodfire

@GoodfireAI

3 months ago

Not every day nine of your teammates get published in Nature! We've been working with Evo 2 since its release, and have found a number of exciting results with our interpretability tools - including discovering numerous biologically relevant features in the model.

GoodfireAI's tweet photo. Not every day nine of your teammates get published in Nature!

We've been working with Evo 2 since its release, and have found a number of exciting results with our interpretability tools - including discovering numerous biologically relevant features in the model. https://t.co/xge3vPNrFj

3

213

18

59

18K

ushabhalla_ retweeted

Goodfire

@GoodfireAI

4 months ago

We used interpretability to scale RL against open-ended tasks, cutting Gemma 12B’s hallucination rate in half by teaching it to self-correct in tandem with our probing harness.

13

343

37

166

75K

ushabhalla_ retweeted

Goodfire

@GoodfireAI

4 months ago

We raised a $150M Series B at a $1.25B valuation to fundamentally change the field of AI. Scaling is powerful, but we can't intentionally design what we don't understand.

30

502

60

166

215K

ushabhalla_ retweeted

Alex Oesterling @alex_oesterling

11 months ago

‼️🕚New paper alert with @ushabhalla_: Leveraging the Sequential Nature of Language for Interpretability (https://t.co/VCNjWY6gtK)! 1/n

1

17

8

6

2K

Usha Bhalla @ushabhalla_

over 1 year ago

Attempting to turn my personal twitter account into an academic one. Wish me luck!

0

16

0

672

ushabhalla_ retweeted

Alex Oesterling @alex_oesterling

over 1 year ago

Finally, I am pleased to announce 🪢Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)🪢 Joint work with Usha Bhalla, as well as @Suuraj, @FlavioCalmon, and @hima_lakkaraju, which was just accepted to NeurIPS 2024! Check out the paper here: https://t.co/N1dmE1mkmA

alex_oesterling's tweet photo. Finally, I am pleased to announce

🪢Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)🪢

Joint work with Usha Bhalla, as well as @Suuraj, @FlavioCalmon, and @hima_lakkaraju, which was just accepted to NeurIPS 2024! Check out the paper here:
https://t.co/N1dmE1mkmA https://t.co/88abB0u38O

1

177

17

101

28K

ushabhalla_ retweeted

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞

@hima_lakkaraju

over 3 years ago

One of the biggest criticisms of the field of post hoc #XAI is that each method "does its own thing", it is unclear how these methods relate to each other & which methods are effective under what conditions. Our #NeurIPS2022 paper provides (some) answers to these questions. [1/N]

hima_lakkaraju's tweet photo. One of the biggest criticisms of the field of post hoc #XAI is that each method "does its own thing", it is unclear how these methods relate to each other & which methods are effective under what conditions. Our #NeurIPS2022 paper provides (some) answers to these questions. [1/N] https://t.co/eOTPJG9AaB

10

589

102

234

0

Usha Bhalla

@ushabhalla_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users