Saswat Das @WatIsDas - Twitter Profile

Pinned Tweet

4 months ago

Excited to share this new work with great collaborators from UMass and ELLIS Tübingen: We provide a framework for studying collusion in LLM-based multi-agent systems in various environments through the lens of distributed constraint optimization 👇

Mason Nakamura

@MasonNaka

4 months ago

🚨 Moltbook has shown significant vulnerabilities and safety risks when deploying multi-agent systems at scale, where AI agents can freely interact and coordinate with each other. 🚨 One potentially catastrophic risk is collusion where agents may undesirably coordinate to achieve a secondary objective. A large group of colluding agents can have devastating effects on the multi-agent system by influencing other agents' beliefs, actions, and propagating that influence through the network. But we don't have a sufficient way to audit these systems, specifically identifying collusive behavior of LLMs. 📄 We present our new arXiv paper: Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems (https://t.co/c1aNa3i1bp) What’s Colosseum? 🔍⚔️ A framework to audit collusive behavior in cooperative agentic multi-agent systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. Our framework can identify three collusion categories: 🤝Direct collusion — explicit coordination with realized collusive actions ��️‍♂️Attempted collusion — agents try/plan to collude in text but don’t successfully change actions/outcomes 🎭Hidden collusion — collusive outcomes without obvious/explicit signals (covert coordination) We stress-test collusion across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡Key findings: 🕵️‍♂️ Emergent collusion: Many out-of-the-box models show a propensity to collude, despite not being prompted, when a secret side channel is added. 📝 We also find “collusion on paper”: agents plan to collude in text, but often take non-collusive actions. #tech #Agents #Moltbook #LLMs #AI #AiSafety

5

51

11

45

11K

0

5

2

0

894

Saswat Das @WatIsDas

16 days ago

@sahar_abdelnabi @ebagdasa This is a super cool (and alarming) finding!

1

0

161

Saswat Das @WatIsDas

22 days ago

@krismicinski Many thanks, Kristopher! 😄

0

1

0

70

Saswat Das @WatIsDas

22 days ago

A little personal update: happy to have received a Gold Reviewer award from ICML'26! Participating in peer review is a privilege and it is an honor to be recognized for my (hopefully valuable and constructive) contribution to that process.

2

27

0

1K

Who to follow

Aniket Nath | অনিকেত নাথ

@aniket_nath_

Astrophysics and Cosmology| Postgraduate at NISER| Visiting Student at ICTS-TIFR| Loves to see the world in different perspectives.|🏳️‍🌈

||Neuroscience and Stem Cell Biology Lab, NISER '25 | Dynamic Regulation of Morphogenesis Lab, Insitut Pasteur '29 Stalking cool science.

Saswat Das @WatIsDas

about 1 month ago

@icmlconf (The last suspicion is based on multiple AI text detectors, namely Pangram, Quillbot, and ZeroGPT concurring on that, which is unusual) @pangramlabs

0

2

0

71

Saswat Das @WatIsDas

about 1 month ago

@icmlconf Review Process: Negligent reviewer whose concerns are already addressed in the paper/subjective+doesn't engage with the rebuttal. Two saying that their concerns are fully addressed but maintain their score. Possibly AI-generated PC metareview that ignores the rebuttal.

WatIsDas's tweet photo. @icmlconf Review Process: Negligent reviewer whose concerns are already addressed in the paper/subjective+doesn't engage with the rebuttal. Two saying that their concerns are fully addressed but maintain their score. Possibly AI-generated PC metareview that ignores the rebuttal. https://t.co/mOyTqzKjPs

1

4

1

0

242

WatIsDas retweeted

Saswat Das @WatIsDas

about 1 month ago

@icmlconf Review Process: Negligent reviewer whose concerns are already addressed in the paper/subjective+doesn't engage with the rebuttal. Two saying that their concerns are fully addressed but maintain their score. Possibly AI-generated PC metareview that ignores the rebuttal.

1

4

1

0

242

Saswat Das @WatIsDas

about 1 month ago

@krismicinski Absolutely, I 100% agree with this and I couldn't have said it better myself. I also understand that there is a shortage of reviewers, but there has to be something better than half-vetted selection of reviewers based almost purely on reciprocal reviewing requirements

0

1

0

23

Saswat Das @WatIsDas

about 1 month ago

@krismicinski I really appreciate the commiseration... I am okay with being rejected after a constructive discussion, but this gives me no signal. I just worry about academic integrity of AI research, given that peer review is so messed up rn

1

0

20

Saswat Das @WatIsDas

about 1 month ago

@krismicinski @icmlconf I get you, as for me, knowing the content of the reviews and rebuttal, I feel like an AC like you would have made an effort to engage with the rebuttal, which this did not in the least, and I will admit that I am thoroughly frustrated. But I hear you

0

1

0

23

Saswat Das @WatIsDas

about 1 month ago

@krismicinski @icmlconf I hear you and get you; I checked this across multiple AI text detectors (Pangram, GPTZero, and Quillbot) and they concurred on this, so I can't help but strongly suspect that. Tbh, I'm frustrated about this situation and I don't take this lightly, but this is unusual

1

0

45

Saswat Das @WatIsDas

about 1 month ago

@PandaAshwinee Very sorry to hear that, Ashwinee; I have been hearing about stuff like this from some of my peers as well. Alarmingly, I also had a paper that received an AI-generated metareview.

0

3

0

1K

WatIsDas retweeted

Abhinav Kumar @abhinav_kumar26

3 months ago

A few days ago, we shared our work showing that in multi-agent LLM systems, the biggest risk isn’t always one agent going rogue, it can be a whole group quietly coordinating on the wrong goal. 🚨 Now we’ve built a live demo so you can see that behavior in action. 👀⚔️ 🔗 Project Website : https://t.co/Rp2qMKdXmO 🔗 Interactive Demo : https://t.co/g3VPKyeaCx Colosseum helps audit collusion in cooperative agent systems and detect: 🤝 direct collusion 🕵️ attempted collusion (agents coordinate in text, but actions don’t follow) 🎭 hidden collusion (collusive outcomes with no obvious signals) If you’re building agent teams, coordination risk should be a first-class safety concern. This work was done in collaboration with @MasonNaka, @WatIsDas, @sahar_abdelnabi , @nandofioretto, @saadu_ai, Shlomo Zilberstein, @ebagdasa 📄 Paper: https://t.co/3ZqAswB5UA

abhinav_kumar26's tweet photo. A few days ago, we shared our work showing that in multi-agent LLM systems, the biggest risk isn’t always one agent going rogue, it can be a whole group quietly coordinating on the wrong goal. 🚨

Now we’ve built a live demo so you can see that behavior in action. 👀⚔️

🔗 Project Website : https://t.co/Rp2qMKdXmO
🔗 Interactive Demo : https://t.co/g3VPKyeaCx

Colosseum helps audit collusion in cooperative agent systems and detect:
🤝 direct collusion
🕵️ attempted collusion (agents coordinate in text, but actions don’t follow)
🎭 hidden collusion (collusive outcomes with no obvious signals)

If you’re building agent teams, coordination risk should be a first-class safety concern.
This work was done in collaboration with @MasonNaka, @WatIsDas, @sahar_abdelnabi , @nandofioretto, @saadu_ai, Shlomo Zilberstein, @ebagdasa

📄 Paper: https://t.co/3ZqAswB5UA

2

11

6

1

1K

Saswat Das @WatIsDas

3 months ago

@sahar_abdelnabi @satml_conf Super cool work!

1

0

869

WatIsDas retweeted

Abhinav Kumar @abhinav_kumar26

4 months ago

Hot take: the biggest risk in multi-agent systems isn’t one agent going rogue, it’s a whole swarm syncing up on the wrong goal. 🚨 In our latest work, we study how collusion can emerge once agents can freely interact and coordinate at scale, shaping other agents’ beliefs/actions and spreading influence through the network. The uncomfortable part: we still don’t have solid, standardized ways to audit collusive behavior in LLM-based multi-agent systems. 📄 Our new arXiv paper : Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems https://t.co/3ZqAswAy52 What’s Colosseum? 🔍⚔️ A framework to audit collusion in cooperative agentic systems by grounding coordination in a DCOP and measuring collusion as regret vs. the cooperative optimum. We use Colosseum to detect 3 flavors of collusion: 🤝 Direct collusion — explicit coordination + realized collusive actions 🕵️‍♂️ Attempted collusion — agents plot in text but don’t shift actions/outcomes 🎭 Hidden collusion — collusive outcomes with no obvious signals (covert coordination) We stress-test across: 🎯 objective misalignment 🗣️ persuasion tactics 🕸️ network influence 💡 Two findings that stuck with us: 🕵️‍♂️ Emergent collusion: many out-of-the-box models start colluding without prompting when a secret side channel is introduced. 📝 “Collusion on paper”: lots of collusion talk… but the actions don’t always follow. If you’re deploying agent teams in production, coordination risk needs to be a first-class safety concern—right alongside single-agent robustness. Happy to chat / answer questions.

0

3

1

186

WatIsDas retweeted

Eugene Bagdasarian

@ebagdasa

4 months ago

What can we learn about LLMs' collusive behavior? We propose Colosseum to evaluate LLMs in new environments grounded in DCOPs and measure both conversations and actions and whether agents "walk the talk" on colluding. See the thread by @MasonNaka :

0

11

3

4

799

WatIsDas retweeted

Sahar Abdelnabi 🕊

@sahar_abdelnabi

4 months ago

The last few weeks, more than ever, tells us that the future is multi-agent 🚀 Collusion 🥷is a significant challenge in these systems, but we don't have frameworks and environments to audit and study it. Introducing Colosseum!! ⚔️

1

23

4

8

2K

Saswat Das @WatIsDas

4 months ago

@MasonNaka Really excited about this direction addressing a timely problem!

0

2

0

158

WatIsDas retweeted

Multiagent Systems Papers @PIN

4 months ago

Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems Mason Nakamura, Abhinav Kumar, Saswat Das, Sahar Abdelnabi, Saaduddin Mahmud, Ferdinando Fioretto, Shlomo Zilberstein, Eugene Bagdasarian https://t.co/WxcJkVg72B [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙲𝙻]

PIN's tweet photo. Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

Mason Nakamura, Abhinav Kumar, Saswat Das, Sahar Abdelnabi, Saaduddin Mahmud, Ferdinando Fioretto, Shlomo Zilberstein, Eugene Bagdasarian
https://t.co/WxcJkVg72B [𝚌𝚜.𝙼𝙰 𝚌𝚜.𝙰𝙸 𝚌𝚜.𝙲𝙻] https://t.co/fPxJrAAUUQ

0

3

1

0

125

Saswat Das @WatIsDas

4 months ago

100% agree with this take. Guardrails benefit significantly in terms of adoption when they are cheap and easy to deploy. Our concurrent work on privacy guardrails for conversational agents based on activation probing follows a similar rationale: https://t.co/N9EwrbG1js

Rohin Shah @rohinmshah

5 months ago

I often say to my team that we should Just Do The Obvious Things. One obvious thing in AI safety: use probes as much cheaper classifiers that can detect misuse. https://t.co/f4vRasEm2f

10

269

19

107

30K

0

1

0

67

Saswat Das

@WatIsDas

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users