Constanza Fierro @constanzafierro - Twitter Profile

7 days ago

@yoavgo Yeah! We did some experiments in this work where we find that bias only fine tuning generalizes better than activation steering for behavior modifications https://t.co/enWCxfz9Z8

1

0

110

Constanza Fierro @constanzafierro

about 1 month ago

I’m presenting this work today at #ICLR2026 at 3:15pm in Pavilion 4 #3914 Come say hi! ☺️

Constanza Fierro @constanzafierro

7 months ago

Can we find weight directions to modify LLM's behaviors? Our new paper proposes contrastive weight steering, an alternative to activation steering for modifying behaviors using small narrow distribution data 🕹️ ��👇

constanzafierro's tweet photo. Can we find weight directions to modify LLM's behaviors?

Our new paper proposes contrastive weight steering, an alternative to activation steering for modifying behaviors using small narrow distribution data 🕹️

��👇 https://t.co/zok1eDUaWD

5

213

32

143

19K

1

34

4

7

4K

constanzafierro retweeted

David Bau @davidbau

6 months ago

At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today. Here is a blog post summarizing the talk: https://t.co/LSwBf9XQzE

davidbau's tweet photo. At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today.

Here is a blog post summarizing the talk:

https://t.co/LSwBf9XQzE https://t.co/Fmff42hcO0

23

557

101

379

109K

Constanza Fierro @constanzafierro

7 months ago

@ESRogs @DanielCHTan97 We actually tried this as a baseline in the experiments and for some behaviors it works, but for others it fails completely (steering towards non-sycophancy)

0

1

0

15

Who to follow

Ruizhe Li

@liruizhe94

Lecturer (Assistant Professor) @ABDNCompSci | Postdoc research fellow @ucl_wi_group | PhD CS @SheffieldNLP | mechanistic interpretability, multimodal LLMs

Jia-Chen Gu

@Jiachen_Gu

Postdoc @UCLA @UCLANLP

Emanuele Bugliarello

@ebugliarello

Multimodal researcher @GoogleDeepMind

Constanza Fierro @constanzafierro

7 months ago

@DanielCHTan97 Thanks for the shout out! Link to the thread in case anyone wants to check it out https://t.co/tQZ1SW5vdV

Constanza Fierro @constanzafierro

7 months ago

Can we find weight directions to modify LLM's behaviors? Our new paper proposes contrastive weight steering, an alternative to activation steering for modifying behaviors using small narrow distribution data 🕹️ ��👇

5

213

32

143

19K

0

1

0

71

Constanza Fierro @constanzafierro

7 months ago

@Prakucho Cool! We missed this connection. We’ll add the citation in the next arXiv version 😄

0

1

0

57

Constanza Fierro @constanzafierro

7 months ago

Can we find weight directions to modify LLM's behaviors? Our new paper proposes contrastive weight steering, an alternative to activation steering for modifying behaviors using small narrow distribution data 🕹️ 🧵👇

5

213

32

143

19K

Constanza Fierro @constanzafierro

7 months ago

@Butanium_ Greatest common divisor

0

1

0

13

Constanza Fierro @constanzafierro

7 months ago

Blogpost: https://t.co/QaccMlAYpC

2

9

0

3

581

Constanza Fierro @constanzafierro

7 months ago

Check out the paper and blogpost for more details: Paper: https://t.co/enWCxfzHOG

1

9

1

4

654

Constanza Fierro

@constanzafierro

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users