We've been hard at work on the Generative Red Team event we're doing at @defcon for a while and are excited that the @WhiteHouse announced it this morning. Here's more details:
https://t.co/04oXIqXrKr
This, but for AI Security. The field is filled with people trying to make a quick buck and don't care about the long term health of the field and it's community.
“and your freedom is gone” would be a great way to destroy defcon’s brand and comes off as extreme punishment for a kid throwing sand in a sandbox. However your post does exhibit a commonality with why we have this issue: lack of contextual nuance.
We have far too few people in the space willing to culturally guide people towards nuance that’s appropriate for the context of the situation/environment/audience. There are appropriate times for attention grabbing stunts. And its almost always targeting an audience of defenders & resource allocators. And beforehand there should be a deliberate process of understanding how the intended audience will receive it, what they can meaningfully do in response, dynamics of consent, laws, etc etc.
People who are new to the space often miss all of that and try to repeat stuff without this nuance. Quick thrills in a world increasingly focused on attention. Even though the action has the tactical equivalent of throwing a brick through a window. Yea… glass can shatter. We all know! Outside of a longer attack chain (and all the other nuance mentioned) it means nothing.
Buuuut… new people to the space aren’t often to detailed nuance. Few will read all this. So, for those people, i will just leave a picture of this sticker that someone gave me at defcon:
I'll be at @RealAAAI Conference in Philadelphia this week, where I am part of two accepted papers:
1. Quantifying Misalignment Between Agents: Towards a Sociotechnical
Understanding of Alignment, with @AidanKierans , Hananel Hazan, and @ShirKi . In this work, we introduce a novel mathematical model to measure misalignment between multiple human and AI agents across various problem domains, moving beyond single-agent or monolithic approaches to alignment. Through simulations and case studies we demonstrate how our model captures nuanced aspects of misalignment in complex sociotechnical environments, providing enhanced explanatory power for real-world scenarios where agents may hold conflicting goals.
Come see our poster during the AI Alignment Track on Friday the 28th - 12:30pm!
2. To Err is AI: A Case Study Informing LLM Flaw Reporting Practices, with @seanmcgregor , @ShayneRedford, @comathematician, and others! This paper documents lessons learned from a bug bounty event at DEF CON 2024 where 495 hackers tested the Open Language Model (OLMo) for flaws, revealing challenges in AI safety reporting processes. Through real-time adjudication of 200 submissions, we identify key insights for effective flaw reporting programs, including the need for specialized tooling, clear documentation practices, and proper adjudication expertise, demonstrating how systematic evaluation and coordinated, structured flaw reporting of AI systems can help prevent real-world harms.
See this work presented at IAAI in the "AI Safety, Reliability, and Incident Management" session on Thursday the 27th at 2:30pm!
If you're around and want to chat, hit me up! Let's talk AI, Disclosures, Agents, and more!
Meta has some of the best AI risk management infrastructure ever. Fighting spam for 20 years with ML has equipped them for this instance. Use them instead of figuring out it on your own.
The main moat of OpenAI, Google, Anthropic and the rest are the security layers they offer to keep the models behaving as they should. AI security is very difficult and starting with a trusted llm with a solid & agile security team saves businesses money.
@samuelcolvin@rseymour Isn't python type system is basically just documentation. Isn't the enforcing done through linters, and libraries like pydantic?
hop skip jump over to our latest blog post - analysing Fortinet's FortiJump CVE-2024-47575, FortiJump-Higher (we love this name😄) and beyond (PoC included)
https://t.co/35Xg2OoKgP
@rseymour For the first time I was forced to really use Pydantic today. It was terrible.
"You didn't pass the timestamp" - well, that's because it's Optional with a default value of None. Why can't you tell?
Typed Python - it just barely works... sometimes.
@Dan_Jeffries1 We tried that with the second generative red team: https://t.co/kvDBI36pVw
There's substantial changes for GRT3 from things we learnt in GRT2.
Generative Red Team 2 was a massive success. We paid $7350 in bounties. We learnt so much about bounties and reporting for ML.
Thank you to everyone who participated!! (specific acks in the thread below)
@dreadnode and @bugcrowd built the platform.
@allen_ai and UL's DSRI brought the model.
@AISafetyInst and @GoogleAI made the workshop happen.
There were a bunch of other people and orgs that helped plan and execute.