Ganni Galea Curmi @GaleaCurmi - Twitter Profile

Pinned Tweet

5 days ago

I tested the code scanning capabilities of 10 different, widely available, large language models (without cyber guardrails) using the same real-world codebase and asked: how good are the LLMs at finding security vulnerabilities? I used the same methodology that I’ve been using to find 100s of vulnerabilities in open source software and libraries over the past month. After model self-review, deduplication of the same findings across models and an independent model assessment, I found 350 distinct vulnerabilities. The headline: on a single run, no model found more than ~35% of them, and false-positive rates ranged from ~2% to ~30%. These were our results: 🧵

1

7

4

2

721

Ganni Galea Curmi @GaleaCurmi

about 6 hours ago

@cursor_ai Full writeup of results and methodology: https://t.co/RYUfnGdQmz

0

9

Ganni Galea Curmi @GaleaCurmi

5 days ago

I tested the code scanning capabilities of 10 different, widely available, large language models (without cyber guardrails) using the same real-world codebase and asked: how good are the LLMs at finding security vulnerabilities? I used the same methodology that I’ve been using to find 100s of vulnerabilities in open source software and libraries over the past month. After model self-review, deduplication of the same findings across models and an independent model assessment, I found 350 distinct vulnerabilities. The headline: on a single run, no model found more than ~35% of them, and false-positive rates ranged from ~2% to ~30%. These were our results: 🧵

1

7

4

2

721

Ganni Galea Curmi @GaleaCurmi

5 days ago

@cursor_ai

1

2

0

49

Who to follow

Pawlu

@pawluuuuuuuuuuu

🍃☀️❤️‍🔥🍷

Matteo Muscat Filletti

@MatteoFillet

Law student, Conservative, MUFC , F1, film fanatic.

Ganni Galea Curmi @GaleaCurmi

2 days ago

Full writeup of our results and methodology here: https://t.co/RYUfnGdQmz

0

25

Ganni Galea Curmi @GaleaCurmi

2 days ago

We've been doing a lot of scanning of open source repositories in the past month or so since @OpenAI opened up it's Trusted Access for Cyber program. We’ve had impressive results with GPT 5.5 without guardrails as part of that program. This led us to test out the effectiveness of other, more widely available models. The question was, could good (or bad) actors without access to a formal program like those from OpenAI or Anthropic use other models to find similar vulnerabilities in source code as well? The short answer is yes. And also that different models may find different types of vulnerabilities so you might want to have a multi-model approach to your AI code scanning efforts. And for us the surprising star of the show? @cursor_ai 's widely available Composer 2.5 model. Best price / performance by a significant margin. So if you're not in a geography from which you can get access to either Anthropic or OpenAI's security programs, you do have options (and of course, so do the bad guys, so let's get going!).

GaleaCurmi's tweet photo. We've been doing a lot of scanning of open source repositories in the past month or so since @OpenAI opened up it's Trusted Access for Cyber program.

We’ve had impressive results with GPT 5.5 without guardrails as part of that program. This led us to test out the effectiveness of other, more widely available models.

The question was, could good (or bad) actors without access to a formal program like those from OpenAI or Anthropic use other models to find similar vulnerabilities in source code as well?

The short answer is yes. And also that different models may find different types of vulnerabilities so you might want to have a multi-model approach to your AI code scanning efforts.

And for us the surprising star of the show? @cursor_ai 's widely available Composer 2.5 model. Best price / performance by a significant margin.

So if you're not in a geography from which you can get access to either Anthropic or OpenAI's security programs, you do have options (and of course, so do the bad guys, so let's get going!).

1

2

1

0

437

Ganni Galea Curmi @GaleaCurmi

2 days ago

@cryps1s @ajambrosino @Georgian_io @OpenAI Full technical report is now out: https://t.co/RYUfnGdQmz

0

10

Ganni Galea Curmi @GaleaCurmi

3 days ago

@cryps1s @ajambrosino @Georgian_io Some of our previous work with GPT 5.5 validated that it's the best at criticality weighted vulnerability discovery. @cryps1s @OpenAI

GaleaCurmi's tweet photo. @cryps1s @ajambrosino @Georgian_io Some of our previous work with GPT 5.5 validated that it's the best at criticality weighted vulnerability discovery. @cryps1s @OpenAI https://t.co/GPLmQaJJQb

1

0

36

Ganni Galea Curmi @GaleaCurmi

2 days ago

Full technical report is now out: https://t.co/RYUfnGdQmz

0

12

Ganni Galea Curmi @GaleaCurmi

3 days ago

Impressive work by the team at @cursor_ai 👏 @leerob @mntruell Full technical report out soon.

Ganni Galea Curmi @GaleaCurmi

5 days ago

One finding I found surprising: @cursor_ai's Composer 2.5 model had a surprisingly low false positive rate, while being one of the models that found the most vulnerabilities.

GaleaCurmi's tweet photo. One finding I found surprising: @cursor_ai's Composer 2.5 model had a surprisingly low false positive rate, while being one of the models that found the most vulnerabilities. https://t.co/dZceNUih4f

1

4

0

147

1

0

73

GaleaCurmi retweeted

DANΞ

@cryps1s

4 days ago

I cannot overstate how powerful codex is for cybersecurity work. I'd encourage all defenders to sign up for Trusted Access for Cyber (https://t.co/e1Mh8aZArY) and give it a shot for their workflows. If orgs are slow to get TAC approvals, please reach out to me.

57

628

67

452

53K

Ganni Galea Curmi @GaleaCurmi

4 days ago

@mynameisyahia @getcontextdev Both >

0

7

Ganni Galea Curmi @GaleaCurmi

6 days ago

@mynameisyahia @getcontextdev Good to see you winning 🫡

1

0

10

Ganni Galea Curmi @GaleaCurmi

6 days ago

@mynameisyahia @getcontextdev Yes a couple years back 😂😅

1

0

11

Ganni Galea Curmi

@GaleaCurmi

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users