Lukasz Staniszewski @lukxst - Twitter Profile

about 1 month ago

@rosmine we found it works surprisingly well for continual learning as well! for T sequential tasks, tuning -> merging -> reinit beats fine-tuning the same LoRA https://t.co/sltuEUjlm4

2

42

4

27

2K

lukxst retweeted

Tomasz Łakomy

@tlakomy

5 months ago

Reviewing Claude Code output:

75

5K

420

742

379K

lukxst retweeted

Bartosz Cywinski @bartoszcyw

9 months ago

Can we catch an AI hiding information from us? To find out, we trained LLMs to keep secrets: things they know but refuse to say. Then we tested black-box & white-box interp methods for uncovering them and many worked! We release our models so you can test your own techniques too!

9

137

23

69

29K

lukxst retweeted

Yiping Lu

@2prime_PKU

11 months ago

Anyone knows adam?

265

5K

431

499

636K

lukxst retweeted

Bartosz Cywinski @bartoszcyw

about 1 year ago

New paper: Deceptive LLMs may keep secrets from their operators. Can we elicit this latent knowledge? Maybe! Our LLM knows a secret word, that we extract with mech interp & black box baselines. We open source our model, how much better can you do? w/@emilaryd @sen_r @NeelNanda5

bartoszcyw's tweet photo. New paper: Deceptive LLMs may keep secrets from their operators. Can we elicit this latent knowledge? Maybe!

Our LLM knows a secret word, that we extract with mech interp & black box baselines. We open source our model, how much better can you do?
w/@emilaryd @sen_r @NeelNanda5 https://t.co/0smgwKd1gF

2

113

17

62

17K

lukxst retweeted

Bartosz Cywinski @bartoszcyw

over 1 year ago

🔥 New ICLR 2025 Paper! It would be cool to control the content of text generated by diffusion models with less than 1% of parameters, right? And how about doing it across diverse architectures and within various applications? 🚀 🫡 Together with @lukxst, we show how: 🧵 1/

2

126

26

57

9K

lukxst retweeted

Bartosz Cywinski @bartoszcyw

over 1 year ago

🔥 New Paper! How can sparse autoencoders (SAEs) applied to diffusion models help us solve real-world challenges? 🚀 Introducing 𝗦𝗔𝗲𝗨𝗿𝗼𝗻: We use SAEs for unlearning in diffusion models and outperform existing baselines! Here's how it works: 🧵 1/

5

227

48

161

24K

Lukasz Staniszewski @lukxst

over 1 year ago

@awsaf49 @xieenze_jr It must be some kind of safety filter on your prompt, that outputs this heart every time your prompt is kinda toxic

lukxst's tweet photo. @awsaf49 @xieenze_jr It must be some kind of safety filter on your prompt, that outputs this heart every time your prompt is kinda toxic https://t.co/LVYuI3F33L

1

0

46

Lukasz Staniszewski @lukxst

almost 2 years ago

@fffiloni Transformer models like SD3/Flux will need something more advanced to find the style- and object-influenced layers. For SDXL, we had to inject an alter prompt to one c-a layer/block. Here, activations patching may come in handy.

0

47

Lukasz Staniszewski

@lukxst

Last Seen Users on Sotwe

Trends for you

Most Popular Users