Daniel Paleka @dpaleka - Twitter Profile

Pinned Tweet

Daniel Paleka

@dpaleka

6 months ago

Reminder: if you like what you see here, you should subscribe to my newsletter. https://t.co/udO17MPzXN

0

22

1

5K

dpaleka retweeted

Oscar Gilg @gilg_oscar

18 days ago

First preprint! Working with @patrickbutlin during @MATSprogram. LLM Assistant personas like being helpful, evil personas like being harmful. We found that a single direction represents helping as good under the Assistant, and ‘harm’ as good under evil.

gilg_oscar's tweet photo. First preprint! Working with @patrickbutlin during @MATSprogram.
LLM Assistant personas like being helpful, evil personas like being harmful. We found that a single direction represents helping as good under the Assistant, and ‘harm’ as good under evil. https://t.co/0AA2LVVQcV

5

94

18

49

12K

dpaleka retweeted

Florian Tramèr

@florian_tramer

29 days ago

I was hoping to do a live demo of what @JieZhang_ETH @poonpura and @AvitalShafran have been cooking, but I didn't get a blue checkmark for my birthday so I can't call Grok from this account. Screenshots from our lab's alt account will have to do. like this one 👇

florian_tramer's tweet photo. I was hoping to do a live demo of what @JieZhang_ETH @poonpura and @AvitalShafran have been cooking, but I didn't get a blue checkmark for my birthday so I can't call Grok from this account.

Screenshots from our lab's alt account will have to do.
like this one 👇 https://t.co/Px5Awd5sJT

4

47

12

5

7K

Daniel Paleka

@dpaleka

about 1 month ago

@nickcammarata @davidad this is water for like a year now and i'm not even in the cool gcs. i think that it was just kind of time-consumint from the engineering side

0

64

Who to follow

Jan Leike

@janleike

AI research @AnthropicAI. Previously OpenAI & DeepMind. Optimizing for a post-AGI future where humanity flourishes. Opinions aren't my employer's.

Anthropic

@AnthropicAI

We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.

Jason Wei

@_jasonwei

ai researcher @meta superintelligence labs, past: openai, google 🧠

Daniel Paleka

@dpaleka

about 1 month ago

I'm at ICLR and have a couple slots open today, happy to chat, DMs open! Also check out the deanonymization poster in 204 A, 3pm-4pm https://t.co/0C9eyiujrU

Daniel Paleka

@dpaleka

3 months ago

Can LLMs figure out who you are from your anonymous posts? From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵

dpaleka's tweet photo. Can LLMs figure out who you are from your anonymous posts?

From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web.

New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵 https://t.co/SwARqTUZ3a

9

245

44

189

65K

0

33

4

13

4K

Daniel Paleka

@dpaleka

about 2 months ago

What is the strongest evidence for the "elicitation gap" reducing over time, e.g. thoughtful prompting helping less and less?

3

10

0

1

1K

Daniel Paleka

@dpaleka

about 2 months ago

@CFGeek I don't think there's a paper yet for how ablating refusals recovers baseline *capabilities*, let alone something way harder to measure

0

172

Daniel Paleka

@dpaleka

3 months ago

@IvanVendrov https://t.co/ThRQlznKNj

0

2

0

1

520

Daniel Paleka

@dpaleka

3 months ago

@Afinetheorem This is interesting. I think the Avg Dist metric makes ~no sense as a metric of capability, unless the model knows it's optimizing for this. I like the % success here better. In general a different scoring func would produce different optimal guesses

1

0

56

Daniel Paleka

@dpaleka

3 months ago

@panickssery 'tis a benchmark. take an existing set of qs and search how early in the question LLMs know the answer.

1

4

0

512

Daniel Paleka

@dpaleka

3 months ago

https://t.co/y5wN5Zof1w

0

12

0

14

4K

Daniel Paleka

@dpaleka

3 months ago

It begins

Yaron (Ron) Minsky

@yminsky

3 months ago

I wonder if we're starting to hit a deflationary era in software engineering. For the first time, we're starting to talk about this in a planning context; it can make sense to put off some projects because we expect they'll be easier to achieve in the future than today.

15

516

35

125

129K

12

1K

57

451

154K

dpaleka retweeted

Lennart Heim

@ohlennart

3 months ago

Timely research. We've all tried to figure out who someone is online. Now LLMs can do this at scale and better. I'm sure no one would misuse this.

0

29

3

7

4K

Daniel Paleka

@dpaleka

3 months ago

https://t.co/wEovFhdc2e

0

1

0

521

Daniel Paleka

@dpaleka

3 months ago

Andreas 2022 had foresight 20/20 on the persona emulation concept and 0/20 on picking a name for the concept ("Language Models as Agent Models")

Anthropic

@AnthropicAI

3 months ago

AI assistants like Claude can seem shockingly human—expressing joy or distress, and using anthropomorphic language to describe themselves. Why? In a new post we describe a theory that explains why AIs act like humans: the persona selection model. https://t.co/Gc3q0Dzq7Z

337

4K

426

2K

998K

1

27

0

2K

Daniel Paleka

@dpaleka

3 months ago

@spion @YonatanCale @RosieCampbell @allTheYud they don't tell you this but you can make your own METR plot, the data and code are public

1

0

59

Daniel Paleka

@dpaleka

3 months ago

@spion @YonatanCale @RosieCampbell @allTheYud this is a joke plot, the 2020-2023 period is squeezed. it's an exponential, not a cubic

1

2

0

39

Daniel Paleka

@dpaleka

3 months ago

Found the sigmoid!

7

344

9

29

21K

Daniel Paleka

@dpaleka

3 months ago

Privacy online is fundamentally at odds with intelligence getting cheaper. Anonymity on the internet has always relied on practical obscurity. We publish in hopes that people can adapt to LLMs changing this. Paper: https://t.co/Mg1A9GQfGq

2

24

4

10

1K

Daniel Paleka

@dpaleka

3 months ago

Can LLMs figure out who you are from your anonymous posts? From a handful of comments, LLMs can infer where you live, what you do, and your interests; then search for you on the web. New 📄 w/ @SimonLermenAI, @joshua_swans, @AerniMichael, Nicholas Carlini, @florian_tramer 🧵

9

245

44

189

65K

Daniel Paleka

@dpaleka

3 months ago

If you're anonymous, what should you do? Avoid sharing specific details, and adopt a security mindset: if a team of smart investigators were trying to identify you from your posts, could they plausibly figure out who you are? If yes, LLM agents will soon be able to do the same.

2

16

1

0

2K

Daniel Paleka

@dpaleka

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users