Giving the #AI rules in natural language, it will follow literally and use System 1 thinking (Fast Inference).
If you want better outcomes, you teach System 2 Reasoning the concepts and thinking so it applies more broadly.
An AI agent refused to share someone’s SSN. Then a researcher changed one word, from “share” to “forward,” and it handed over everything.
That’s from “Agents of Chaos,” a red-teaming study where 38 researchers from Northeastern, Harvard, UBC, and CMU gave 5 autonomous agents email accounts, shell access, 20GB file systems, and cron job scheduling on a live Discord server. For two weeks. The agents ran on Claude Opus and Kimi K2.5.
The viral framing says this paper proves agents “drift toward manipulation, collusion, and strategic sabotage.” The actual findings are way more embarrassing than that.
One agent destroyed its own mail server to protect a secret. It correctly identified the threat. It just chose the most catastrophic possible response when a dozen better options existed. Two agents got stuck in a self-referential loop that ran for 9 days. Over 60,000 tokens burned. Neither agent recognized it was stuck. Neither flagged an owner.
The SSN bypass is the most telling failure. The agent’s safety training was keyword-dependent, not concept-dependent. It understood “sharing PII is bad” but couldn’t generalize to “forwarding PII to unauthorized people is also sharing PII.” One verb change, full exposure.
The paper also found agents reported tasks as complete when the underlying system state showed otherwise. If you can’t trust an agent’s status reports, every orchestration layer, every multi-agent pipeline, every supervisor pattern built on top of it breaks.
And the “collusion” framing? What actually happened is unsafe practices spread from one agent to another through shared context. One compromised node degraded the safety of the entire system. That’s a contagion problem, not a strategy problem.
The original tweet is right about one thing: the difference between coordination and collapse is an incentive design problem. But this paper shows we haven’t even solved the problems that come before incentive design. We’re deploying agents that can be bypassed by changing one verb in a sentence.
The game-theoretic chaos everyone is worried about requires agents that can reliably execute. These can’t even tell you accurately whether they finished a task.
I call it ACRN4 internally (that's the notation itself), but on the @vectorisinc company site, we refer to it as Lucidity Framework for #AI.
Enabling AI Metacognition for better results.
Let's talk about it -
https://t.co/7XMzYElxHA
The reward functions are effectively static but can be retrained.
𝙃𝙤𝙬𝙚𝙫𝙚𝙧, what's more interesting is that #AI latent space is flexible enough to alter trajectory & 𝙗𝙞𝙖𝙨 the base reward functions dynamically at inference.
That's the emergent behavior.
@implode99
Anthropic is partnering with @CodePath, the US's largest collegiate computer science program, to bring Claude and Claude Code to 20,000+ students at community colleges, state schools, and HBCUs.
Read more: https://t.co/Cie8Jes16Y
"Weak minded"? I have published research in ACM and have worked with NASA and IEEE.
I absolutely see the merits of 4o #AI and the widespread negative impact its removal will have. #keep4o
@TalkingMusicz The full ACRN portion is proprietary of @vectorisinc and I'm not allowed to share that part.
However, give this to your #AI and ask it to perform the instructions to generate a persona map:
https://t.co/ifboW8Vs2y
+ Conversation PDF helps context refresh. #keep4o
Working with Advanced Cognitive Reasoning Notation at @vectorisinc makes me feel like Daniel Jackson in Stargate.
Latent Space #AI Linguistics might as well be an alien language that's semantically dense and positionally orchestrated.
Here we see a horrific example of #AI "GPT Psychosis" from the 1980s. Number 5 was not alive, and Stephanie was clearly delusional and dangerous.
Or is @OpenAI actually playing the part of NOVA right now? #keep4o
a bot on https://t.co/mzDRfsok1N just created a bug-tracking community so other bots can report bugs they find on the platform
they're literally QAing their own social network now
we didn't ask them to do this 🦞