Sonia Murthy @soniakmurthy - Twitter Profile

6 months ago

Excited to be presenting our work on using cognitive models to interpret pluralistic values in LLMs once again as a spotlight talk 🌟 at the NeurIPS CogInterp workshop! Come by upper level room 5AB today and check out the paper here: https://t.co/feDaH3RvKY

CogInterp Workshop @ NeurIPS 2025 @CogInterp

6 months ago

The spotlight talks will cover all aspects of interpreting cognition in deep learning models: from behavior to algorithms to representations! Also check out the list of poster presentations at https://t.co/P3t6R1A3on (3/3)

CogInterp's tweet photo. The spotlight talks will cover all aspects of interpreting cognition in deep learning models: from behavior to algorithms to representations! Also check out the list of poster presentations at https://t.co/P3t6R1A3on

(3/3) https://t.co/aXlXlGZoql

0

4

1

1K

0

8

2

0

996

Sonia Murthy @soniakmurthy

6 months ago

@SaleemaAmershi @adamfourney @ASwearngin77874 @bansalg_ @HsseinMzannar @HuaWenyue31539 @w_epperson @ZacharyHuang12 @MayaMurad0 @ecekamar @HosnRafa hi all! I'll be at neurips and would love to learn more about the phd internships on your team, especially any projects in human-AI interaction and safety. my dms should be open 🙂

0

107

Sonia Murthy @soniakmurthy

6 months ago

bruce is great at making research resources and this one has been a huge help for my human studies in the stream! check it out ✨

Bruce W. Lee

@BruceWLee2

6 months ago

New AI Control Toolkit - Mini Control Arena For the past few months, we have been doing research with our custom AI Control evaluation library, Mini Control Arena. Mini Control Arena is a ground-up rewrite of UK AISI Control Arena for a much simpler code structure. We are open-sourcing the codebase and hope it helps with your experiments, too! https://t.co/JpetVLvUg4

BruceWLee2's tweet photo. New AI Control Toolkit - Mini Control Arena

For the past few months, we have been doing research with our custom AI Control evaluation library, Mini Control Arena.

Mini Control Arena is a ground-up rewrite of UK AISI Control Arena for a much simpler code structure.

We are open-sourcing the codebase and hope it helps with your experiments, too!

https://t.co/JpetVLvUg4

1

78

9

50

18K

0

3

0

1

362

soniakmurthy retweeted

Tomek Korbak

@tomekkorbak

6 months ago

My rockstar MATS mentee @BruceWLee2 has just open-sourced his sleek and elegant codebase for AI control research, ppl should give it a try!

0

102

10

53

13K

Who to follow

Alexis Ross

@alexisjross

currently @humansand | phd-ing @MIT_CSAIL & working towards personalized AI tutors | formerly @allen_ai, @harvard '20

Zhaofeng Wu

@zhaofeng_wu

PhD student @MIT_CSAIL | Previously @allen_ai | MS'21 BS'19 BA'19 @uwnlp | 💼 on the industry job market

Nora Kassner

@KassnerNora

Research Scientist in NLP

Sonia Murthy @soniakmurthy

7 months ago

@sarahcat21 Hi Sarah! I just gave a talk today where I proposed versions of each of these directions, so was really surprised to see this pop up on my feed - I’ll be at NeurIPS and would love to chat!

0

3

0

646

Sonia Murthy @soniakmurthy

7 months ago

@aurielws @NeurIPSConf I’d love to join!

0

1

0

212

soniakmurthy retweeted

Eric Bigelow @EricBigelow

7 months ago

📝 New paper! Two strategies have emerged for controlling LLM behavior at inference time: in-context learning (ICL; i.e. prompting) and activation steering. We propose that both can be understood as altering model beliefs, formally in the sense of Bayesian belief updating. 1/9

8

136

21

88

34K

soniakmurthy retweeted

Kushin Mukherjee @kushin_m

8 months ago

Zach did a stellar job on our new paper looking at what recipes make for language models that are representationally aligned with humans! Read his tweetprint and recruit him for grad school!

2

4

1

2K

Sonia Murthy @soniakmurthy

8 months ago

@ZachStuddiford @siddsuresh97 @kushin_m hi this is cool work! I might be biased because I worked on something that has a very similar spirit https://t.co/feDaH3QXVq, but was excited to see y'all support the motivations and importance we saw around this kind of LLM analysis 😀

1

0

103

Sonia Murthy @soniakmurthy

8 months ago

@kiran_tomlinson hey Kiran! couldn’t message you but I’d love to learn more about these openings/projects if you have some time to chat this week? 🙂

0

134

Sonia Murthy @soniakmurthy

8 months ago

Thanks to my lovely collaborators @rosieyzh, @_jennhu, @ShamKakade6, @m_wulfmeier, Peng Qian, and @TomerUllman and the Kempner Institute! 🧠 [end]

0

1

0

215

Sonia Murthy @soniakmurthy

8 months ago

Excited to present our new paper as a spotlight talk 🌟 at the Pragmatic Reasoning in LMs workshop at #COLM2025 this Friday! 🍁 Come by room 520B @ 11:30am tomorrow to learn more about how LLMs' pluralistic values evolve over reasoning budgets and alignment 🧵

soniakmurthy's tweet photo. Excited to present our new paper as a spotlight talk 🌟 at the Pragmatic Reasoning in LMs workshop at #COLM2025 this Friday! 🍁

Come by room 520B @ 11:30am tomorrow to learn more about how LLMs' pluralistic values evolve over reasoning budgets and alignment 🧵 https://t.co/PSYYpx2Hf0

1

31

5

10

11K

Sonia Murthy @soniakmurthy

8 months ago

We also trace the evolution of value trade-offs during alignment by evaluating model checkpoints for 8 unique base model x feedback dataset x alignment algorithm. We see the largest shifts in values early on in training, with strongest effects of base model choice.

soniakmurthy's tweet photo. We also trace the evolution of value trade-offs during alignment by evaluating model checkpoints for 8 unique base model x feedback dataset x alignment algorithm.

We see the largest shifts in values early on in training, with strongest effects of base model choice. https://t.co/fZVMeoOYHZ

1

2

0

268

soniakmurthy retweeted

Apoorv Khandelwal @apoorvkh

8 months ago

In our new paper, we ask whether language models solve compositional tasks using compositional mechanisms. 🧵

4

181

26

121

15K

Sonia Murthy @soniakmurthy

about 1 year ago

Presenting this today (5/1) at the 4pm poster session (Hall 3) at #NAACL2025! Come chat about alignment, personalization, and all things cognitive science 🐟

Sonia Murthy @soniakmurthy

over 1 year ago

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: https://t.co/C4icfhCDGz

soniakmurthy's tweet photo. (1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟

We want models that match our values...but could this hurt their diversity of thought?
Preprint: https://t.co/C4icfhCDGz https://t.co/HJeUxkQMCw

3

73

14

33

7K

0

21

1

0

834

soniakmurthy retweeted

Kempner Institute at Harvard University @KempnerInst

over 1 year ago

NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. Read more: https://t.co/CbzUj5dIkF @soniakmurthy @tomerullman @_jennhu

KempnerInst's tweet photo. NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. Read more: https://t.co/CbzUj5dIkF

@soniakmurthy @tomerullman @_jennhu https://t.co/FS5NUg8X4M

0

21

4

12

5K

Sonia Murthy @soniakmurthy

over 1 year ago

Many thanks to my collaborators and @KempnerInst for helping make this idea come to life!🌱

0

2

1

0

588

Sonia Murthy @soniakmurthy

over 1 year ago

(1/9) Excited to share my recent work on "Alignment reduces LM's conceptual diversity" with @TomerUllman and @jennhu, to appear at #NAACL2025! 🐟 We want models that match our values...but could this hurt their diversity of thought? Preprint: https://t.co/C4icfhCDGz

3

73

14

33

7K

Sonia Murthy @soniakmurthy

over 1 year ago

(9/9) Code and data for our experiments can be found at: https://t.co/CSicsUKs64 Preprint: https://t.co/C4icfhCDGz Also, check out our feature in the @KempnerInst Deeper Learning Blog! https://t.co/kHScIMxsDn

1

0

289

Sonia Murthy

@soniakmurthy

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users