Our harness discovered multiple 0-days in networking stack of Linux Kernel (using publicly accessible LLMs)
This is one of many CVEs to come
Thank you @GuanniQu for great collaboration!!
$30
That's what it cost us to reproduce Anthropic's Mythos findings, FreeBSD, OpenBSD, FFmpeg using GPT 5.4 by OpenAI and other public models in an open-source harness.
The economics of vulnerability discovery are shifting fast. The moat isn't model access anymore - it's validation.
Finding vuln signal is getting cheap, but turning it into trusted security workflow is still hard.
Thanks OpenAI for Devs!
Mythos myth is busted.
We reproduced Anthropic’s public Mythos examples of vulnerabilities in FreeBSD, OpenBSD, FFmpeg, using GPT-5.4 and Claude Opus 4.6.
We reproduced every public example we tested with at least one widely available model.
Message to defenders: attackers won’t wait. The challenge is using these models to detect and patch vulnerabilities before bad guys do, in real production environment, at scale.
Co-authors: @kannthu1@AmadeuszL Marek Lewandowski, Kuba Sienkiewicz, Mikołaj Palkiewicz
We replicated Mythos findings in opencode using public models, not Anthropic's private stack.
The moat is moving from model access to validation: finding vulnerability signal is getting cheaper; turning it into trusted security
A better way to read Anthropic's Mythos release is not "one lab has a magical model."
It is: the economics of vulnerability discovery are changing.
We took the patched public Mythos examples and tried to reproduce them with GPT-5.4 and Claude Opus 4.6 in an open-source harness. Every run stayed below $30 per file.
AI models are already good enough to narrow the search space, surface real leads, and sometimes recover the full root cause in battle-tested code.
The takeaway: model access is not the moat anymore. Validation is. Finding vulnerability signal is getting cheaper; turning it into trusted security work is still hard.
Co-authors: @KlaKlo_, Amadeusz, Marek, Kuba, Mikolaj
AI exposed that we’ve been sitting on critical vulnerabilities for years.
Claude Mythos found a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw.
But it didn’t create state-level hackers.
What changed is the economics:
→ Exploit development is now cheap
→ Scalable
→ And fast enough to matter
So what’s the Mythos hype about? Detecting old 0days at scale, that’s cool, but using GPT-5.4 and Opus, we were able to autonomously discover 0days in the Linux kernel over the past 3 weeks.
Mythos may be better at surfacing potential issues in code, but the “scary” threshold was crossed back in December, if not earlier.
This plays perfectly into Anthropic’s hype cycle, especially with an IPO reportedly planned for the end of the year.
I will say it again, we used GPT5.4 and Opus, and we were able to autonomously find zero-days in the Linux Kernel (in the last 3 weeks)
Mythos is probably better at the task of finding potential issues in code, but imo the threshold for "scary" was reached in December or even earlier
This is a great hype machine for Anthropic, especially that they plan to do IPO eoy
I totally agree - this is not a new capability
Don't wait for foreign actors to hack you, let us hack you first.
If you join us at @daytonaio Compute Conference, we can hack you in less than 30 min.
If we fail, we will buy you dinner.
The Takeaway:
Modern hacks aren't usually buffer overflows.
They are Logic Chains.
We don't just build security tool; we prove why security matters.
Read the full technical write-up here: https://t.co/bjLJANGunD
We found a way to bypass authentication on one of Europe’s fastest-growing AI platforms.
No leaked passwords.
No brute force.
Just a simple "Cookie Jar" configuration error that spiraled into a critical vulnerability.
Here's the technical breakdown of the exploit chain. 🧵
The Fix:
VIDOC’s automated analysis flagged this cross-context vulnerability immediately.
@Lovable’s engineers patched it fast — hardening the cookie scope and enforcing strict frame-ancestors policies before any users were impacted. 🛡️