Really happy to share that our paper -โMalice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chainโ has won Best Paper (Outstanding Problem Paper) at @CAISconf ๐
TL;DR - Poison ~2% of an AI agentโs fine-tuning traces and you can plant a trigger-activated backdoor that leaks confidential data >80% of the time. Guardrails completely miss it.
(and thanks GPT for editing me into the team photo so cleanly nobody can tell I wasnโt actually there ๐ )
As @karpathy just highlighted, a single poisoned version of LiteLLM up for less than an hour was enough to exfiltrate SSH keys, cloud credentials, API keys, crypto wallets, and more from anyone who ran pip install. The attack used a malicious .pth file, a mechanism that executes automatically when Python starts. No explicit import needed. Just installing the package was enough.
This is a textbook software supply chain attack. But it also points to something deeper that we've been studying. AI systems don't just depend on code. They depend on training data, collection environments, and model artifacts an entire supply chain that is largely unaudited. And unlike malicious code, which can (in theory) be inspected, poisoned data and weights are far harder to detect.
In our paper "Malice in Agentland," we formalize three threat models that target different layers of this agentic AI supply chain:
1. Data poisoning - an attacker controls a fraction of the training traces used to fine-tune an agent
2. Environmental poisoning - malicious instructions are injected into the webpages or tools an agent interacts with during data collection
3. Weight poisoning - a pre-backdoored base model is fine-tuned on clean data, and the backdoor survives
The results are amazing. Poisoning as few as 2% of collected traces is enough to embed a trigger-activated backdoor that causes an agent to silently leak confidential user information with over 80% success. And the defenses we tested 2 guardrail models and one weight-based defense all failed to catch it.
The LiteLLM attack stole credentials. An equivalent attack on the AI supply chain could implant persistent behavioral backdoors agents that behave normally until a specific trigger phrase appears, then silently exfiltrate data, manipulate outputs, or take unauthorized actions. And because these backdoors live in model weights rather than source code, they evade the inspection tools we rely on today.
As we know, every dependency you install could be hiding a poisoned package deep in its tree. The same is true for every dataset, every pretrained checkpoint, every training pipeline. As AI agents gain autonomy, securing the full stack code, data, environments, and weights is no longer optional.
Read our full Paper: https://t.co/EonnemxEbr
Iโm at #NeurIPS in San Diego this week!
Presenting โSeparating Data and Control Planes for Agentic Safety Browsingโ at the Women in Machine Learning Workshop (#WiML).
๐ December 2nd , 6
-9 pm
Excited to chat AI security, LLM evals, or just grab a coffee slide into my DMs โ
Headed to NeurIPS 2025?
We made a conference-wide Slack with channels for people to find afterparties, meet other attendees, and more!
https://t.co/68va8udgEi
Excited to be at #COLM2025 ๐จ๐ฆ - letโs talk AI agent safety, evolving threats, multimodal reasoning & all things LLMs ๐ค
๐ Wed AM (#78): ๐ DoomArena - Agents vs evolving threatsโจ๐ Wed PM (#84): ๐ BigCharts-R1 - Chart reasoning via visual finetuning
DoomArena is now accepted in @COLM_conf '25. Thanks all the reviewers for the support and constructive feedbacks! ๐ค Looking forward to chatting about AI agent security in Mtl! ๐จ๐ฆ
ArXiv: https://t.co/ZqPW0SlLfi
GitHub: https://t.co/W1520QTe6g
Website: https://t.co/3aRu7KNQsG
Just opened my Google Scholar to a surprise 100+ citations! ๐คฏ
Started at @ServiceNowRSRCH with zero research experience and no PhD. Didnโt think Iโd get here.
Grateful for patient mentors, brilliant collabs & everyone who engaged with my work ๐งก
Hereโs to more learning!๐
๐ Struggling with literature reviews? LitLLM can help! This AI-powered tool retrieves relevant papers, ranks them using LLMs, and structures comprehensive reviews in no time. Just input your abstract and let AI streamline your research! #LitLLM#AIforResearch
I'm beyond thrilled to have been part of this journey from the very beginning. Witnessing my very first CVPR submission get accepted is a surreal milestone! Explore the tweet for more details, and check out our open-sourced code, dataset and model at https://t.co/YRn2khB8vc๐๐
Iโm also super excited to attend @NeurIPSConf for the first time in person in Vancouver! ๐จ๐ฆ๐ก Canโt wait to present our work:โ๐ฝ๐๐๐ฟ๐ค๐๐จ: ๐ผ๐ฃ ๐๐ฅ๐๐ฃ ๐๐ฃ๐ ๐๐๐ง๐ข๐๐จ๐จ๐๐ซ๐๐ก๐ฎ-๐๐๐๐๐ฃ๐จ๐๐ ๐ฟ๐๐ฉ๐๐จ๐๐ฉ ๐๐ค๐ง ๐๐ง๐๐๐ฃ๐๐ฃ๐ ๐๐ช๐ก๐ฉ๐๐ข๐ค๐๐๐ก ๐๐ค๐๐๐ก๐จโ