First preprint! Working with @patrickbutlin during @MATSprogram.
LLM Assistant personas like being helpful, evil personas like being harmful. We found that a single direction represents helping as good under the Assistant, and ‘harm’ as good under evil.
I was hoping to do a live demo of what @JieZhang_ETH@poonpura and @AvitalShafran have been cooking, but I didn't get a blue checkmark for my birthday so I can't call Grok from this account.
Screenshots from our lab's alt account will have to do.
like this one 👇
@nickcammarata@davidad this is water for like a year now and i'm not even in the cool gcs. i think that it was just kind of time-consumint from the engineering side
I'm at ICLR and have a couple slots open today, happy to chat, DMs open! Also check out the deanonymization poster in 204 A, 3pm-4pm https://t.co/0C9eyiujrU
Can LLMs figure out who you are from your anonymous posts?
From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web.
New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵
@Afinetheorem This is interesting. I think the Avg Dist metric makes ~no sense as a metric of capability, unless the model knows it's optimizing for this. I like the % success here better. In general a different scoring func would produce different optimal guesses
I wonder if we're starting to hit a deflationary era in software engineering. For the first time, we're starting to talk about this in a planning context; it can make sense to put off some projects because we expect they'll be easier to achieve in the future than today.
AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why?
In a new post we describe a theory that explains why AIs act like humans: the persona selection model.
https://t.co/Gc3q0Dzq7Z
Privacy online is fundamentally at odds with intelligence getting cheaper.
Anonymity on the internet has always relied on practical obscurity. We publish in hopes that people can adapt to LLMs changing this.
Paper: https://t.co/Mg1A9GQfGq
Can LLMs figure out who you are from your anonymous posts?
From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web.
New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵
If you're anonymous, what should you do?
Avoid sharing specific details, and adopt a security mindset: if a team of smart investigators were trying to identify you from your posts, could they plausibly figure out who you are? If yes, LLM agents will soon be able to do the same.