Yashaswi pupneja

18 days ago

@AbhayPuri98 @CAISconf Congrats! 🏆 Wild findings and a flawless photoshop 😂

YPupneja retweeted

18 days ago

Really happy to share that our paper -“Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain” has won Best Paper (Outstanding Problem Paper) at @CAISconf 🏆 TL;DR - Poison ~2% of an AI agent’s fine-tuning traces and you can plant a trigger-activated backdoor that leaks confidential data >80% of the time. Guardrails completely miss it. (and thanks GPT for editing me into the team photo so cleanly nobody can tell I wasn’t actually there 😅)

AbhayPuri98's tweet photo. Really happy to share that our paper -“Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain” has won Best Paper (Outstanding Problem Paper) at @CAISconf 🏆

TL;DR - Poison ~2% of an AI agent’s fine-tuning traces and you can plant a trigger-activated backdoor that leaks confidential data >80% of the time. Guardrails completely miss it.
(and thanks GPT for editing me into the team photo so cleanly nobody can tell I wasn’t actually there 😅)

YPupneja retweeted

PhD in LLM robustness and alignment @Mila_Quebec. Likes mountains.

3 months ago

As @karpathy just highlighted, a single poisoned version of LiteLLM up for less than an hour was enough to exfiltrate SSH keys, cloud credentials, API keys, crypto wallets, and more from anyone who ran pip install. The attack used a malicious .pth file, a mechanism that executes automatically when Python starts. No explicit import needed. Just installing the package was enough. This is a textbook software supply chain attack. But it also points to something deeper that we've been studying. AI systems don't just depend on code. They depend on training data, collection environments, and model artifacts an entire supply chain that is largely unaudited. And unlike malicious code, which can (in theory) be inspected, poisoned data and weights are far harder to detect. In our paper "Malice in Agentland," we formalize three threat models that target different layers of this agentic AI supply chain: 1. Data poisoning - an attacker controls a fraction of the training traces used to fine-tune an agent 2. Environmental poisoning - malicious instructions are injected into the webpages or tools an agent interacts with during data collection 3. Weight poisoning - a pre-backdoored base model is fine-tuned on clean data, and the backdoor survives The results are amazing. Poisoning as few as 2% of collected traces is enough to embed a trigger-activated backdoor that causes an agent to silently leak confidential user information with over 80% success. And the defenses we tested 2 guardrail models and one weight-based defense all failed to catch it. The LiteLLM attack stole credentials. An equivalent attack on the AI supply chain could implant persistent behavioral backdoors agents that behave normally until a specific trigger phrase appears, then silently exfiltrate data, manipulate outputs, or take unauthorized actions. And because these backdoors live in model weights rather than source code, they evade the inspection tools we rely on today. As we know, every dependency you install could be hiding a poisoned package deep in its tree. The same is true for every dataset, every pretrained checkpoint, every training pipeline. As AI agents gain autonomy, securing the full stack code, data, environments, and weights is no longer optional. Read our full Paper: https://t.co/EonnemxEbr

$AbhayPuri98's tweet photo. As @karpathy just highlighted, a single poisoned version of LiteLLM up for less than an hour was enough to exfiltrate SSH keys, cloud credentials, API keys, crypto wallets, and more from anyone who ran pip install. The attack used a malicious .pth file, a mechanism that executes automatically when Python starts. No explicit import needed. Just installing the package was enough. This is a textbook software supply chain attack. But it also points to something deeper that we've been studying. AI systems don't just depend on code. They depend on training data, collection environments, and model artifacts an entire supply chain that is largely unaudited. And unlike malicious code, which can (in theory) be inspected, poisoned data and weights are far harder to detect. In our paper "Malice in Agentland," we formalize three threat models that target different layers of this agentic AI supply chain: 1. Data poisoning - an attacker controls a fraction of the training traces used to fine-tune an agent 2. Environmental poisoning - malicious instructions are injected into the webpages or tools an agent interacts with during data collection 3. Weight poisoning - a pre-backdoored base model is fine-tuned on clean data, and the backdoor survives The results are amazing. Poisoning as few as 2% of collected traces is enough to embed a trigger-activated backdoor that causes an agent to silently leak confidential user information with over 80% success. And the defenses we tested 2 guardrail models and one weight-based defense all failed to catch it. The LiteLLM attack stole credentials. An equivalent attack on the AI supply chain could implant persistent behavioral backdoors agents that behave normally until a specific trigger phrase appears, then silently exfiltrate data, manipulate outputs, or take unauthorized actions. And because these backdoors live in model weights rather than source code, they evade the inspection tools we rely on today. As we know, every dependency you install could be hiding a poisoned package deep in its tree. The same is true for every dataset, every pretrained checkpoint, every training pipeline. As AI agents gain autonomy, securing the full stack code, data, environments, and weights is no longer optional. Read our full Paper: https://t.co/EonnemxEbr$

YPupneja retweeted

Karan🧋

@kmeanskaran

6 months ago

best LLM meme so far😂😂 @ordax

164

10K

611

490

915K

Who to follow

David Dobre

@busycalibrating

Sacha Morin

@SachMorin

PhD student at Université de Montréal and @Mila_Quebec. Embodied AI, Robotics, 3D perception.

Gopeshh Subbaraj

@gopeshh1

PhD Student @Mila_Quebec/UdeM Interested in RL and CL! Prev. developing software @MathWorks. Robotics Grad @WPI. Alum @ReachNITT Views my own!

7 months ago

I’m at #NeurIPS in San Diego this week! Presenting “Separating Data and Control Planes for Agentic Safety Browsing” at the Women in Machine Learning Workshop (#WiML). 📅 December 2nd , 6 -9 pm Excited to chat AI security, LLM evals, or just grab a coffee slide into my DMs ☕

YPupneja's tweet photo. I’m at #NeurIPS in San Diego this week!
Presenting “Separating Data and Control Planes for Agentic Safety Browsing” at the Women in Machine Learning Workshop (#WiML).
📅 December 2nd , 6
-9 pm
Excited to chat AI security, LLM evals, or just grab a coffee slide into my DMs ☕ https://t.co/tMxtEPbhty

302

YPupneja retweeted

Delta Institute

@DeltaInstitutes

7 months ago

Headed to NeurIPS 2025? We made a conference-wide Slack with channels for people to find afterparties, meet other attendees, and more! https://t.co/68va8udgEi

62K

7 months ago

@JulianGoldieSEO N8n

YPupneja retweeted

8 months ago

Excited to be at #COLM2025 🇨🇦 - let’s talk AI agent safety, evolving threats, multimodal reasoning & all things LLMs 🤖 📍 Wed AM (#78): 🔐 DoomArena - Agents vs evolving threats 📍 Wed PM (#84): 📊 BigCharts-R1 - Chart reasoning via visual finetuning

AbhayPuri98's tweet photo. Excited to be at #COLM2025 🇨🇦 - let’s talk AI agent safety, evolving threats, multimodal reasoning & all things LLMs 🤖
📍 Wed AM (#78): 🔐 DoomArena - Agents vs evolving threats 📍 Wed PM (#84): 📊 BigCharts-R1 - Chart reasoning via visual finetuning https://t.co/T69ExZPDDx

YPupneja retweeted

11 months ago

DoomArena is now accepted in @COLM_conf '25. Thanks all the reviewers for the support and constructive feedbacks! 🤝 Looking forward to chatting about AI agent security in Mtl! 🇨🇦 ArXiv: https://t.co/ZqPW0SlLfi GitHub: https://t.co/W1520QTe6g Website: https://t.co/3aRu7KNQsG

YPupneja retweeted

12 months ago

Just opened my Google Scholar to a surprise 100+ citations! 🤯 Started at @ServiceNowRSRCH with zero research experience and no PhD. Didn’t think I’d get here. Grateful for patient mentors, brilliant collabs & everyone who engaged with my work 🧡 Here’s to more learning!🚀

AbhayPuri98's tweet photo. Just opened my Google Scholar to a surprise 100+ citations! 🤯
Started at @ServiceNowRSRCH with zero research experience and no PhD. Didn’t think I’d get here.
Grateful for patient mentors, brilliant collabs & everyone who engaged with my work 🧡
Here’s to more learning!🚀 https://t.co/eP00WvfomD

950

12 months ago

@AbhayPuri98 @ServiceNowRSRCH Congratulations 🥳 🎊 👏 Many more to come.

119

YPupneja retweeted

about 1 year ago

🚀 Struggling with literature reviews? LitLLM can help! This AI-powered tool retrieves relevant papers, ranks them using LLMs, and structures comprehensive reviews in no time. Just input your abstract and let AI streamline your research! #LitLLM #AIforResearch

AbhayPuri98's tweet photo. 🚀 Struggling with literature reviews? LitLLM can help! This AI-powered tool retrieves relevant papers, ranks them using LLMs, and structures comprehensive reviews in no time. Just input your abstract and let AI streamline your research! #LitLLM #AIforResearch https://t.co/8DOk0mcW3F

YPupneja retweeted

about 1 year ago

The StarVector demo is now live on Hugging Face! Check it out and share your feedback.

YPupneja retweeted

about 1 year ago

I'm beyond thrilled to have been part of this journey from the very beginning. Witnessing my very first CVPR submission get accepted is a surreal milestone! Explore the tweet for more details, and check out our open-sourced code, dataset and model at https://t.co/YRn2khB8vc🌟🚀

736

YPupneja retweeted

over 1 year ago

I’m also super excited to attend @NeurIPSConf for the first time in person in Vancouver! 🇨🇦🎡 Can’t wait to present our work:“𝘽𝙞𝙜𝘿𝙤𝙘𝙨: 𝘼𝙣 𝙊𝙥𝙚𝙣 𝙖𝙣𝙙 𝙋𝙚𝙧𝙢𝙞𝙨𝙨𝙞𝙫𝙚𝙡𝙮-𝙇𝙞𝙘𝙚𝙣𝙨𝙚𝙙 𝘿𝙖𝙩𝙖𝙨𝙚𝙩 𝙛𝙤𝙧 𝙏𝙧𝙖𝙞𝙣𝙞𝙣𝙜 𝙈𝙪𝙡𝙩𝙞𝙢𝙤𝙙𝙖𝙡 𝙈𝙤𝙙𝙚𝙡𝙨”