AI for Security has never been more exciting.
Let me present MAPTA, our multi-agent framework that found multiple (now confirmed!) Remote Code Executions (RCE's) in flagship web products of Tier-1 companies. Why the secrecy? We're good boys, letting them cook patched through responsible disclosure.
What's our secret sauce? 1/n
not sure what's best, there are pros and cons
"self-reported" is difficult to verify, may not be available at scale but would give a strong industry signal (like cost of mass to orbit 🤣)
"reproduced" is scalable but my not be fully representative (harnesses/data/context/models/hardware may be different)
besides the costs of a positive, the costs of negatives could also be relevant. Possibly the costs of FP and FN as well.
the choice of the right target (if CVEmaxxing) is itself a hard problem
This Berkley team has been absolutely cooking.
What a fantastic way to quantify things that matter in agentic security.
This will help bring the entire space forward.
one thing that would be super dope, is
$ / CVSS
something that quantifies the $ amount per vuln, adjusted for criticality or similar
that's v important because the more efficient the more often the agents can scan and fix things. The more frequent scans can become, the more secure the internet should become.
in general, token efficiency will also likely become more important/looked at over the next months
We reported a critical loss of funds bug to @Thorchain (32M TVL, 150M FDV)
They silently patched it and told us their bug bounty program is permanently retired.
We have more Thorchain chain halt DoS vulns. We intend to release them (open disclosure) in the coming few days
Agents are finding more vulnerabilities than ever. But it turns out there are gaps in existing vulnerability discovery. Over the past 90 days vs. a year ago, web vulnerabilities (XSS/SQLi/CSRF) are down 66% and memory safety exploitability is down 3.5x.
We built the Agentic Vulnerability Coverage Map to track it all, updated daily. Introducing the Berkeley Vulnerability Initiative: https://t.co/qiZ4eThb0n. ⤵️
Evidence of exceptional ability and asking how they solved hard problems down to the brass tacks level is what matters.
Those who actually deserve credit know the details of the solution, because it was so hard it got seared into their brain. The phonies and posers who falsely claim credit will flounder at the second or third level of detail.