Large language models can be persuaded to break their own rules.
Not with fancy code. With actual persuasion.
The authors tested classic persuasion principles, such as authority, commitment, liking, reciprocity, scarcity, social proof, and unity, analysing over 126,000 conversations with three major LLMs.
The result: persuasion increased compliance with objectionable requests from 35.3% to 51.3%.
This suggests that AI guardrails are not always technical barriers. Some of them behave more like social boundaries. They can be pushed, reframed, negotiated.
Why?
Because AI systems are trained on human language. And human language contains not only information, but also pressure, manipulation, deference, authority, seduction.
An AI system trained on human language may therefore inherit the vulnerabilities of humans expressed in language.
*
Paper in the first reply
@NetherPixel7@QualiaQuanta Arxiv has it's time and place, but that should not be where you start your research. You need access to reputable, high quality journals and databases, almost all of which are not accessible to any public LLM.
@GaryMarcus They can't intentionally come up with novel ideas, but if they do it's by a process no different than throwing scrabble pieces on the floor and by happenstance creating a new word arrangement
@rbnmckenna86 Why would it not matter? You either understand how rhetoric works or you're a victim of it. Take the word prediction, LLMs do not predict anything in any sense of the word. And yet calling it prediction imbues it with intention and intelligence and sells the myth.
@VictorTaelin I have no idea if the context here, but gathering information carries a cost that a rational agent also has to account for. Maximizing information at all costs is not rational.
@actualpoweruser@alz_zyd_ Shhh pretending LLMs are magical black boxes is required dogma for their cult. Just play along brother: AI works in mysterious ways ππ»π§π»ββοΈ
@AndyMasley The correct answer is no, and those who answered yes have to first explain why so many neuronal connections in the brain function without an accompanying conscious experience.
AI is not an accurate way to diagnose yourself. APA recommends verifying any mental health or medical information you receive from AI with a health care practitioner. Read more from APAβs new survey on chatbots and mental health: https://t.co/4fSGIfmWFn
@tallinzen@TrendsCognSci@byungdoh When you submit the final draft for your opinion piece, make sure you define the word prediction. Seems rather important, Tal.
@bokuHaruyaHaru Microsoft Word and Grammarly use language models for spellcheck, etc. So how is that an overreach, when its the exact same technology under the hood?
@typebulbit@dioscuri We are. It only seems bad faith because any example of living as though consciousness exists beyond biology is inherently ridiculous.