Todd Nief @toddknife - Twitter Profile

about 23 hours ago

@BlancheMinerva @OwainEvans_UK Oh, the repo is public and we reproduced the original experiments. It was easy to use, very helpful, and felt like the *opposite* of malpractice to me!

1

0

58

Todd Nief @toddknife

2 days ago

An LLM can learn an *obsession* (cats, oak trees, Metallica) through finetuning only on sequences of numbers. This phenomenon is called subliminal learning. Why does this happen? Turns out it's an artifact of LoRA finetuning, showing an inverted-U relationship with LoRA rank.

5

107

14

76

12K

Todd Nief @toddknife

about 23 hours ago

@JustinAngel @universeinanegg I think the inverted U and the model transfer are related to the transfer mechanism: models that show SL seem to have weirdly overconfident digit predictions when completing seemingly random strings

0

2

0

40

Todd Nief @toddknife

about 23 hours ago

@JustinAngel @universeinanegg Yeah agreed with this. FWIW epoch sweep up to 40 epochs at higher LoRA rank didn’t show SL. My intuitions are similar: with constrained capacity, models learn to add a steering vector that encodes the system prompt info. With more capacity, they can prob learn bi/tri-gram stats

0

27

Who to follow

MaxPreps National Basketball Editor. Former owner of HS Hoops Elite. 13 years of experience covering high school hoops. Mr. Basketball USA & Naismith POY voter.

Dave

@DaveWCronin

I can’t get through the days without a glaze across my face.

Todd Nief @toddknife

about 24 hours ago

@BlancheMinerva @OwainEvans_UK It’s possible that there’s some full FT configuration that would transfer, but my intuition is that this is rare. That said, as Owain pointed out, there is a more general phenomenon of student models becoming more like teacher models that doesn’t depend on LoRA

0

24

Todd Nief @toddknife

about 24 hours ago

@BlancheMinerva @OwainEvans_UK I think that all of the open source model experiments in the original paper were with LoRA rank 8 (although the OpenAI API results are much stronger; doesn’t expose hyperparameters or really anything). But yes, full finetuning doesn’t seem to transfer teacher behavior.

2

0

59

Todd Nief @toddknife

2 days ago

@JustinAngel @universeinanegg Agreed that the epoch sweep should be done, running that now. My intuition is that there’s a sweet spot for model capacity to get the entangled solution based on the model confidence at specific digits, but more epochs might allow larger adapters to find the SL solution

1

0

75

Todd Nief @toddknife

2 days ago

@JustinAngel @universeinanegg Some good points here. The string matching is definitely not perfect, but it does align with how much the preference transfers. Still, not clear how to evaluate a “wolf” model becoming obsessed with wolverines or a model trained on “dragonfly” becoming obsessed with bees

0

1

0

62

Todd Nief @toddknife

2 days ago

@tanny2109 It seems that the effect is very noisy and high variance; there’s also some concurrent work showing why some traits don’t subliminally transfer at all

1

0

101

Todd Nief @toddknife

2 days ago

@Bartleby_Kamoi @thkostolansky I lold

0

2

0

68

Todd Nief @toddknife

2 days ago

@Bartleby_Kamoi @thkostolansky The effect is also just kind of weird and high variance so

1

2

0

80

Todd Nief @toddknife

2 days ago

@mykola Seems more schizo than autist to be finding secret patterns in numbers if you ask me

0

41

Todd Nief @toddknife

2 days ago

@BlancheMinerva Although, after reading the other concurrent work on this, a robustness check would be to finetune for way more epochs

1

2

0

165

Todd Nief @toddknife

2 days ago

@BlancheMinerva You may be thinking of emergent misalignment? (Which still happens with full finetuning). Can’t exactly prove the negative, but it seems that subliminal learning is due to LoRA.

1

2

0

229

Todd Nief @toddknife

2 days ago

@thkostolansky @BlancheMinerva You can also get it with vanilla SGD, just need to tune the learning rate

1

2

0

87

Todd Nief @toddknife

2 days ago

@KeremZaman3 @BlancheMinerva You can get it with vanilla SGD, just need to tune the learning rate

0

55

Todd Nief @toddknife

2 days ago

Joint work with @harveyiyun @Bartleby_Kamoi and @universeinanegg! Check out the full paper here: https://t.co/p4jR7cKdfA

0

12

0

3

375

Todd Nief @toddknife

2 days ago

Takeaways: Models are very weird! Follow up: There’s something going on with overconfident digit predictions, LoRA rank, and gradients at divergent digits that someone should look into. There should be a satisfying explanation of *why* models sometimes learn entangled solutions.

1

10

0

343

Todd Nief

@toddknife

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users