Traditional systems fail differently to agentic ones where all dashboards can be green while the system is already drifting into unsafe or unintended behavior.
Wrote about operational vs behavioral integrity and the growing visibility gap in AI security:
https://t.co/eZEfggP0Sr
I haven’t been as active on the socials lately, because I’ve been working on a community project that’s kept me pretty busy. That said, I think I’m finally far enough along with it that I can share the project in its current state and talk more about it.
So, I present to you: https://t.co/EqYQFPuvrF
I bulit site this for a few reasons, but one of the main reasons was/is that I didn’t feel like there was a centralized resource for red teamers that included all the things that red teamers tend to care about. I also wanted to build something that the community could add to, edit, maintain, etc., while also being self-updating, self-healing, and less likely to go stale over time. So, there’s quite a few different cron jobs, GitHub actions, AI calls, API calls, and other workflows that trigger at set intervals and patterns to try to keep it fresh. For example, I’m leveraging various sources (e.g. conference websites) that help identify conference talks which then feeds into a YouTube API to identify conference talks based on certain criteria. I realize there’s still lots work to do, and I’m fully aware that this is a not a 100% fully functioning site at this time. If you have any ideas for improvements, want to report a bug, want to help be a maintainer, or really anything at all, just let me know. I welcome any and all feedback or help!
Also, I know there is a lot of interest in the Scenario Generator module (which I posted about a couple of weeks ago); however, I can't open source it at this time, and it's not currently operational due to Claude API costs to power it. I am still sorting through how to make this available to the community at no charge; however, it may not be possible for what it costs to produce output. More to come on this module! While I sort it out, I am also redesigning it, and you are welcome to check it out in its current state.
@HackingLZ I’m seeing a lot of AI integrations just being tossed in à la Beyblade (just letting it rip) and a lot of core security principles are being thrown out the window. I’m with you here: please just add guardrails and deterministic constraints 🥲