New w/ @AISecurityInst & @UniofOxford:
Frontier AI can now out-persuade expert humans in conversation - incl. world-champ debaters and professional canvassers.
This held even when humans chose their topics, prepared in advance, and competed for £1,000 prizes 🧵
Anthropic now has a team dedicated to AI and the rule of law — and we've just opened our first role.
@AnthropicAI has studied what AI means for the economy. This team asks a different question: what will it mean for executive power, for courts and elections — and for the public deliberation that constitutional democracy ultimately rests on?
We're looking for someone with real depth in both AI and the law — a legal scholar, political scientist, or experienced government hand who can reason about frontier systems and the institutions they will affect.
If that's you, or someone you know: https://t.co/668HDz1lhf
My team at @AISecurityInst studies how frontier AI shapes what we believe, decide, and feel - and we're hiring! 🚨
The role is a 6-month RA residency in London, ideal for MScs / early PhDs in ML, psych, cog/data sci
[1 June deadline]
Get a taste of our recent research below 👇
These Strange New Minds by @summerfieldlab is the LLM book I've been waiting for, and I can't believe I missed it. It's a little dated now (first released summer 2024), but still excellent. Basically, it's a wide-ranging, curious book about LLMs written by someone in the field for a lay audience (high lay; it doesn't completely ignore the math), not the *personalities* who are building AI. This isn't a full review, and there are parts I disagree with, but it's good.
this was a fun project. If you use AI for writing assistance, how does it change what readers think about you? Find out in this new paper led by @paul_rottger!
New paper w/ @AISecurityInst: AI writing assistance distorts how others perceive AI users and their opinions.
Millions of people now use AI to help them write and communicate. In three large experiments (14k participants, 3m+ human ratings) we show that AI writing assistance systematically distorts writer personas – their perceived beliefs, personality, and identity. These distortions are consistent across AI models and persist even under realistic conditions of human oversight.
🧵
Hiring 2 Postdocs to work on Theoretical Foundations of AI Safety @chalmersuniv
If you have a background in Physics, Math, or ML and want to tackle AI alignment at a fundamental level alongside UCL, apply below!
🔗Apply: https://t.co/Z4Pmw4nT5E
🔬Lab: https://t.co/Ip7ciueRhG
My friend and collaborator of 21 years - and my coauthor on Algorithms to Live By - Tom Griffiths has a book out this week on the story of computational cognitive science. If you enjoyed Algorithms to Live By you won't want to miss it. Highly recommended:
Super interesting paper just out in Nature Human Behaviour!
Do humans learn like transformers?
In a smart experiment, the authors trained humans and transformer networks on the same rule-learning task, manipulating only one thing: the distribution of training examples, from fully diverse (every example unique) to highly redundant (the same items repeated).
The first results are already interesting:
Diverse examples lead both human and artificial systems to generalise rules to novel situations.
Redundant examples lead both humans and artificial systems to memorize examples.
Additionally, the switching between these two strategies appear at similar tradeoffs.
So, do humans and transformers learn in the same way? Not quite! And it’s here that things get super interesting:
If you show diverse examples first, humans learn to generalize without losing the ability to memorize later. Transformers, by contrast, do not show the same benefit: when training shifts toward memorization, earlier generalization does not reliably carry over.
Humans can accumulate learning strategies more flexibly than transformers.
Paper in the first reply
Today (w/ @UniofOxford@Stanford@MIT@LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19 LLMs, 707 political issues.
We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more
🧵
@AISecurityInst turned two this month. Happy birthday to us! 🎂
It’s been the privilege of my life to help build and lead this organisation. Here are 10 things I’m proud we’ve achieved in the last two years - a small sample of a much larger list...
it's nearly 2 years since I downed pen on this book, but the main predictions - that the main impacts of AI will be from its increasingly anthropomorphic and and agentic features - are proving correct.
Chinese translation of These Strange New Minds has a very funky cover, but I quite like it! The duck and the parrot represent the deflationist (stochastic parrot) vs. functionalist (if it walks like a duck...) perspectives on LLMs that are discussed in the book.