Updated the Awesome-AI-Security-Skills repo - it now contains over 53 repos for security skills. It contains security skills for web, mobile, iot, red team and web3. It also contains skills for tools like @badsectorlabs Ludus. There are new skill scanners from @RepelloHQ.
So @Doyensec recently published a report comparing @xbow and @AikidoSecurity, two AI pentest platforms.
I figured, why not run @HacktronAI on the same test? So I ran a pentest on one of the target. Hacktron cost $350, while XBOW and Aikido cost $4,000 each. We did pretty well!
Our internal data shows Claude is accelerating AI developmentโa possible path to recursive self-improvement, or AI autonomously building a more capable successor.
Itโs happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx
Each time we release a model, we run the same test: give it code that trains a small AI model, ask the new model to speed it up. It takes a skilled human 4-8 hours to reach 4x faster.
In May 2024, Claude Opus 4 averaged a ~3x speedup. This April, Mythos Preview achieved ~52x.
Agents are finding more vulnerabilities than ever. But it turns out there are gaps in existing vulnerability discovery. Over the past 90 days vs. a year ago, web vulnerabilities (XSS/SQLi/CSRF) are down 66% and memory safety exploitability is down 3.5x.
We built the Agentic Vulnerability Coverage Map to track it all, updated daily. Introducing the Berkeley Vulnerability Initiative: https://t.co/qiZ4eThb0n. โคต๏ธ
Excited to share our most powerful new Claude Code feature: dynamic workflows!
Mention "workflow" in a prompt and Claude will dynamically create an orchestration plan that it strictly follows, allowing you to confidently trust that every stage happens in the right order even across 100s of agents.
Today weโre releasing DeepSWE, a new standard for agentic coding benchmarks.
On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
Just released๐จ#doyensec's latest whitepaper is a head-to-head comparison of the @AikidoSecurity & @XBOW#AI-powered penetration testing platforms.
https://t.co/p3XVIxHKWq
Read how each automates #security testing, their detection capabilities, workflows & usability.
@paoloanzn I think the next gpt model will be on par or above this version of mythos for security.
Btw you can get cyber approved to remove refusals if you're worried about getting your accounts banned for anthropic and openai.
@paoloanzn I haven't used mythos yet so I'm just relying on benchmark reporting and speaking to peers who have. See these https://t.co/QR42xgNCUG and https://t.co/llkx133z1g. Yeah I don't think any lab has a moat right now but the data from benchmarks is impressive.