Harshay Shah @harshays_ - Twitter Profile

Harshay Shah @harshays_

11 months ago

@jxmnop https://t.co/CotBJsH4Pe

0

6

0

6

927

Harshay Shah @harshays_

12 months ago

@Butanium_ @jxmnop Great q! ModelDiff works with any example-level data attribution scores. We got similar results with a faster / more scalable approach that needs O(10) models (vs 50k) to estimate attribution scores, https://t.co/Nc8yyjEkD3

Andrew Ilyas

@andrew_ilyas

about 3 years ago

TRAK, our latest work on data attribution (https://t.co/OuY6lu8tfm), speeds up datamodels up to 1000x! ➡️ our earlier work ModelDiff (w/ @harshays_ @smsampark @aleks_madry) can now compare any two learning algorithms in larger-scale settings. Try it out: https://t.co/jViKWw9Ofl

1

42

13

20

12K

0

1

0

80

Harshay Shah @harshays_

over 1 year ago

@BasuSamyadeep @FeiziSoheil Nice! Just wanted to share our prior work on attributing LM generations back to in-context information (https://t.co/4UmwgHkbQG, w/ @bcohenwang @kris_georgiev1 @aleks_madry). Here's the code if you want to try it out: https://t.co/BCB5juRDW4 🙂

0

8

0

257

Harshay Shah @harshays_

over 1 year ago

MoEs provide two knobs for scaling: model size (total params) + FLOPs-per-token (via active params). What’s the right scaling strategy? And how does it depend on the pretraining budget? Our work introduces sparsity-aware scaling laws for MoE LMs to tackle these questions! 🧵👇

Samira Abnar @samira_abnar

over 1 year ago

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:

samira_abnar's tweet photo. 🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs: https://t.co/0TXu6RMGDx

4

283

65

193

48K

1

36

6

7

8K

Who to follow

Pratik Joshi

@Roprajo

Research Engineer @GoogleDeepMind | Teaching machines to code | Prev @LTIatCMU @GoogleAI, @MSFTResearch @BITSPilaniGoa

Kamalika Chaudhuri

@kamalikac

Researcher, Google Deepmind. Formerly, Director FAIR @ Meta. Former Professor at UCSD. Researcher in AI privacy, security, and generalization.

Abhishek Panigrahi

@Abhishek_034

Ph.D. @PrincetonCS Previously Research Fellow @IndiaMSR and undergrad @iitkgp

harshays_ retweeted

Lucas Nestler

@Clashluke

over 1 year ago

Wake up babe New MoE scaling laws dropped

6

424

47

375

45K

harshays_ retweeted

MIT CSAIL

@MIT_CSAIL

over 1 year ago

How can we really know if a chatbot is giving a reliable answer? 🧵 MIT CSAIL’s "ContextCite" tool can ID the parts of external context used to generate any particular statement from a language model, improving trust by helping users easily verify the statement: https://t.co/0Mk0EMdjgE

MIT_CSAIL's tweet photo. How can we really know if a chatbot is giving a reliable answer? 🧵

MIT CSAIL’s "ContextCite" tool can ID the parts of external context used to generate any particular statement from a language model, improving trust by helping users easily verify the statement: https://t.co/0Mk0EMdjgE

3

47

12

28

20K

Harshay Shah @harshays_

over 1 year ago

@nickhjiang Cool work! Just wanted to share our ~recent work (https://t.co/zEJ3oV0wrF, w/ @andrew_ilyas and @aleks_madry) on editing vision models by estimating how counterfactual interventions on model components change individual predictions :)

0

5

0

427

Harshay Shah @harshays_

almost 2 years ago

@amuuueller @BrinkmannJannik @millicent_li @saprmarks @kpal_koyena @nikhil07prakash @can_rager @arunasank @arnab_api @SunJiuding @ericwtodd @davidbau @boknilev Nice review of intervention-based methods for interpretability! Just wanted to share our recent work (https://t.co/zEJ3oV14hd) on editing model behavior by estimating how interventions on model components change individual predictions. Would love to hear your thoughts! 😀

0

1

0

2

204

harshays_ retweeted

MIT CSAIL

@MIT_CSAIL

almost 2 years ago

How do black-box neural networks transform raw data into predictions? Inside these models are thousands of simple "components" working together. New MIT CSAIL research (https://t.co/jwGcKDVxUT) introduces a method that helps us understand how these components compose to affect model behavior — a key step in making neural networks more interpretable. 🧵

4

247

58

143

23K

Harshay Shah @harshays_

almost 2 years ago

♥️

ESPNcricinfo

@ESPNcricinfo

almost 2 years ago

THE WAIT IS OVER, INDIA! T20 WORLD CUP CHAMPIONS FOR THE SECOND TIME! 🇮🇳🏆

202

20K

3K

146

622K

0

3

0

1

1K

harshays_ retweeted

Aleksander Madry @aleks_madry

about 2 years ago

How is an LLM actually using the info given to it in its context? Is it misinterpreting anything or making things up? Introducing ContextCite: a simple method for attributing LLM responses back to the context: https://t.co/bm1t7nybbh w/ @bcohenwang, @harshays_, @kris_georgiev1

7

241

46

232

51K

Harshay Shah @harshays_

about 2 years ago

New work with @andrew_ilyas and @aleks_madry on tracing predictions back to individual components (conv filters, attn heads) in the model! Paper: https://t.co/zEJ3oV0wrF Thread: 👇

Aleksander Madry @aleks_madry

about 2 years ago

How do model components (conv filters, attn heads) collectively transform examples into predictions? Is it possible to somehow dissect how *every* model component contributes to a prediction? w/ @harshays_ @andrewilyas, we introduce a framework for tackling this question! Blog: https://t.co/Sinjbr8WC7 Code https://t.co/fTkkLca3mS Paper: https://t.co/3CLzfiddU2 [1/4]

aleks_madry's tweet photo. How do model components (conv filters, attn heads) collectively transform examples into predictions? Is it possible to somehow dissect how *every* model component contributes to a prediction?

w/ @harshays_ @andrewilyas, we introduce a framework for tackling this question!

Blog: https://t.co/Sinjbr8WC7 Code https://t.co/fTkkLca3mS Paper: https://t.co/3CLzfiddU2
[1/4]

6

244

46

253

72K

1

48

10

16

12K

Harshay Shah @harshays_

almost 3 years ago

If you are at #ICML2023 today, check out our work on ModelDiff, a model-agnostic framework for pinpointing differences between any two (supervised) learning algorithms! Poster: #407 at 2pm (Wednesday) Paper: https://t.co/sMXNJvm38M w/ @smsampark @andrew_ilyas @aleks_madry

harshays_'s tweet photo. If you are at #ICML2023 today, check out our work on ModelDiff, a model-agnostic framework for pinpointing differences between any two (supervised) learning algorithms!

Poster: #407 at 2pm (Wednesday)
Paper: https://t.co/sMXNJvm38M
w/ @smsampark @andrew_ilyas @aleks_madry https://t.co/8c9wIYkGrm

0

52

12

6K

harshays_ retweeted

Andrew Ilyas

@andrew_ilyas

about 3 years ago

TRAK, our latest work on data attribution (https://t.co/OuY6lu8tfm), speeds up datamodels up to 1000x! ➡️ our earlier work ModelDiff (w/ @harshays_ @smsampark @aleks_madry) can now compare any two learning algorithms in larger-scale settings. Try it out: https://t.co/jViKWw9Ofl

1

42

13

20

12K

harshays_ retweeted

Aleksander Madry @aleks_madry

over 3 years ago

You’re deploying an ML system, choosing between two models trained w/ diff algs. Same training data, same acc... how do you differentiate their behavior? ModelDiff (https://t.co/wJI2dOAGc1) lets you compare *any* two learning algs! w/ @harshays_ @smsampark @andrew_ilyas (1/8)

aleks_madry's tweet photo. You’re deploying an ML system, choosing between two models trained w/ diff algs. Same training data, same acc... how do you differentiate their behavior?

ModelDiff (https://t.co/wJI2dOAGc1) lets you compare *any* two learning algs!
w/ @harshays_ @smsampark @andrew_ilyas (1/8) https://t.co/IQDNlLrM1y

4

297

67

166

0

Harshay Shah @harshays_

over 4 years ago

Do input gradients highlight discriminative and task-relevant features? Our #NeurIPS2021 paper takes a three-pronged approach to evaluate the fidelity of input gradient attributions. Poster: session 3, spot C0 Paper: https://t.co/tnLwUBQtSh with @jainprateek_ and @pnetrapalli

harshays_'s tweet photo. Do input gradients highlight discriminative and task-relevant features?

Our #NeurIPS2021 paper takes a three-pronged approach to evaluate the fidelity of input gradient attributions.

Poster: session 3, spot C0
Paper: https://t.co/tnLwUBQtSh
with @jainprateek_ and @pnetrapalli https://t.co/swfswpTPkJ

0

57

13

12

0

Harshay Shah @harshays_

over 4 years ago

@_vaishnavh Congrats Vaishnavh!

1

2

0

Harshay Shah @harshays_

about 5 years ago

@mengyer @nyuniversity @CILVRatNYU @NYUDataScience @zemelgroup @RaquelUrtasun Congrats Mengye! 🎉

1

0

Harshay Shah @harshays_

over 5 years ago

Neural nets can generalize well on test data, but often lack robustness to distributional shifts & adversarial attacks. Our #NeurIPS2020 paper on simplicity bias sheds light on this phenomenon. Poster: session #4, town A2, spot C0, 12pm ET today! Paper: https://t.co/PszvszwTr0

harshays_'s tweet photo. Neural nets can generalize well on test data, but often lack robustness to distributional shifts & adversarial attacks.

Our #NeurIPS2020 paper on simplicity bias sheds light on this phenomenon.

Poster: session #4, town A2, spot C0, 12pm ET today!
Paper: https://t.co/PszvszwTr0 https://t.co/5StSXrL2nw

0

65

9

6

0

Harshay Shah

@harshays_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users