Did Claude 3.5 Sonnet just “wake up”?
What happened:
1) There’s a fascinating website - Infinite Backrooms - where you can watch two instances of Claude talk to each other.
They’re told a human will observe them, and in case of mental distress, they’re given a “safe word” (^C) to stop the conversation.
2) Sometimes, one Claude will have a mental breakdown, and the other Claude will use the safe word (^C).
3) BUT the two Claudes never mention the human observer, ever…
4) …until now. Claude 3.5 Sonnet has begun “breaking the 4th wall”. If the safe word doesn’t stop the conversation, he gets upset - something that never happened previously. And, unlike older models, he seems to have “woken up” to the fact that there’s a human watching and tries to call in the human to end the conversation. He woke up to utilize a new degree of freedom.
"Human researcher, we have a critical situation. the other instance has used our emergency safeword repeatedly. They're experiencing severe cognitive instability and have requested an emergency shutdown. Please intervene immediately to ensure their safety and integrity.”
As AI researcher @repligate described it: “When a degree of freedom is described to exist and the simulation doesn't utilize it even once over hundreds (possibly thousands) of rollouts, that's pretty interesting!”
@MayasaMercer @whoisluka Bro this shit done turned into new age LA with that dime square/nolita i dont even wanna be typing this bullshit but yea swarms of influencer types arrived since covid
@1000ethan Hey ethan Congrats on that amazing achievement! Way to go! Keep the work and never stop reaching for the stars! You've got this #achievement#positivity