Mythos will probably be a great model for cybersecurity, but I strongly doubt it will be best for EVERY task you need to build high-quality bug-finding automation.
We've been building LLM-powered security tooling for over 3 years, and it has *never* been the case that a single model (or even a single provider) is simultaneously best at all cybersecurity tasks.
This is why we have extensive evaluations for each piece of Xint Code's pipeline. When a new model is available, we make data-driven decisions about how it fits in (if it does at all).
Most evaluations are quantitative (e.g. signal-to-noise, number of known bugs found, etc), but it's important to not overlook qualitative evals as well. For instance, how clear is the bug's root-cause analysis? Are the reproduction steps easy to follow?
The point of scaffolding is the squeeze as much out of these models as possible. Sometimes, models show new capabilities which make us rethink and redesign parts of our pipeline. Often, this means the model passed some threshold which makes an old research idea suddenly become feasible. This generally means *more* scaffolding, not less!
Although I think the post's conclusions are overstated, the core claims (that the AI cybersecurity capability frontier is jagged and scaffolding really matters) is 100% true.
Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://t.co/NQ7IfEtYk7
During AIxCC, I mainly worked on DiscoveryGuy, an LLM-assisted seed generation component. Sometimes, the simplest approaches can have strong results. You can find more details in this blog!
You are probably gonna hate me for the title of this blogpost, but, here is a quick peek into one of the most surprising components of our @DARPA AIxCC CRS: DiscoveryGuy.
https://t.co/QQOFtdql4V
(Planning to publish a few more of these "quick peek" into the system 👀)
Checkout the Post-Mortem of our system ARTIPHISHELL (by @degrigis and I)! We look at a few issues that kept @shellphish from a top-3 spot in @DARPA’s AIxCC:
https://t.co/PO4mL2JPsX
Keep your eyes out for more ARTIPHISHELL content in the future!
While playing @defcon CTF Finals with @shellphish I managed to solve the ICO challenge using LLMs (GPT5 + Cursor) and almost no human intervention. You can read how I did it here! https://t.co/EcqYZdyIfV
DEF CON continues 🔐: @ASU PhD student Wil Gibbs and @UCSB's Lukas Dresel took to the @DARPA AIxCC stage 🎤 to discuss ARTIPHISHELL 🐚🤖—Shellphish’s AI system that finds and patches software bugs 🛠️.
The team earned 5th 🏅in DARPA’s AI Cyber Challenge 🏆! #AIxCC#DEFCON
Team Shellphish came in 5th place in AIxCC! It took an incredible amount of work and 2 years of dedication from all of my amazing team members.
Please check out our CRS ARTIPHISHELL Open Source now on GitHub!
https://t.co/nEJSIkWn3O
Super amped for our new season of @ctfradiooo, exploring the journey of the finalists teams through the @DARPA/@ARPA_H#AIxCC competition! Great preparatory viewing ahead of the award ceremony at @defcon!
Pick a niche, become an expert, find bugs maybe even 0days or reverse n-days, and write blogs. Even if you don’t hit those $100k bounties, it’ll be a stepping stone toward a $100k job.
What niche? How to pick? Examples?
infosec being so vast from web3 sec to web2, mobile, desktop, recon, client-side, server-side, cryptography and so on. These are umbrella terms, but if we zoom in, there are specific areas where spending a lot of focused time will make you a top 20 expert -- 100% sure.
The key thing is, that the current top 20 experts in any niche will eventually be replaced as they get bored or burned out. This leaves room for you, and the easiest way to pick a niche is to learn from an existing expert in the niche, take inspiration, and grind to build on top of it.
1. For instance, I got into the client-side JS niche by following @terjanq’s work. From there, I went down even further to focus specifically on ElectronJS.
2. Another example: @rootxharsh and @iamnoooob their niche is in reversing n-days and finding new ones based on that knowledge. I don’t think anyone in India can compete with them on reversing n-days, writing blogs, and submitting findings to bounty programs.
3. And off the top of my head, @ajxchapman, from his tweets, seems to have a specific niche in V8 n-day exploits. I don’t think there’s anyone else in the web security scene who can write V8 exploits 😅.
4. Like @orange_8361 , pick a complex target and grind on it for months eventually uncovering mind-blowing findings.
5. Or, like @albinowax, choose a complex specification, such as HTTP, and find bugs from every aspect of it from top to bottom
(Sorry for tags xD)
I could list so many more people, but my point is this: if you look at the top bug bounty hunters or experts, there’s a pattern. Their blogs or tweets consistently focus on a specific niche (or two) for years and years. No one ever becomes a pro in a night.
How to Become an Expert in a Specific Niche?
Spend a lot of time. There’s no shortcut. Follow the work of the expert you picked for inspiration, read their blogs, dive into the blogs they learned from, and explore everyone else in that specific niche. Solve CTFs and write about them.
For example, not to make it all about myself, but just as an example. I’ve read every blog from the people I listed as inspirations(https://t.co/5MCSPeoygf) while learning client-side security.
If it’s taking time to understand, you’re likely on the right path. That’s where most people give up, so keep pushing. Just dedicating days to it will put you ahead of at least 100 others. It’s that simple.
Expert = Spent Time × IQ
Find Bugs or 0days, Reverse n-days, and "Write Blogs
Once you’re an expert, finding bugs will start to feel natural. But let’s be real, sometimes you might not get lucky. When that happens, reverse other n-days and write about it. I mean write about anything. Nothing gives you as much exposure as writing blogs: you’re helping others, plus you’re building a network that will eventually help you land a $100k job or $100k bounties.
1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series.
https://t.co/2tv8Pp9MSz
Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel.
#LLM #Reasoning #Mathematics #AGI #Research #Apple
@00oo1chig0oo00
I’ll send you IMPORTANT ELECTION UPDATES for North Carolina. Make sure you are ready to VOTE FOR DONALD J. TRUMP by November 5th.
Reply #stop to opt-out.