Evan Luke @EvanThomasLuke - Twitter Profile

Updated the Awesome-AI-Security-Skills repo - it now contains over 53 repos for security skills. It contains security skills for web, mobile, iot, red team and web3. It also contains skills for tools like @badsectorlabs Ludus. There are new skill scanners from @RepelloHQ.

1

0

17

Evan Luke

@EvanThomasLuke

about 5 hours ago

@newton_cheng awesome work!!

0

266

Who to follow

Dhama

@MukeshDhama

Engineering at https://t.co/qQMca6LJvw

🅲🅷🆁🅸🆂

@chris90980472

𝕮𝖞𝖇𝖊𝖗𝖘𝖊𝖈𝖚𝖗𝖎𝖙𝖞 𝕬𝖓𝖆𝖑𝖞𝖘𝖙/𝕰𝖓𝖌𝖎𝖓𝖊𝖊𝖗 | 𝕸𝕴𝕮 𝖒𝖊𝖒𝖇𝖊𝖗 | 𝕿𝖗𝖆𝖉𝖊𝖗

Ilya Sher

@ilya_sher_prog

Operations, development, management. Author of Next Generation Shell - https://t.co/j0B3hRi2iE . About Ops, Devs & funny stuff.

EvanThomasLuke retweeted

s1r1us (mohan)

@S1r1u5_

4 days ago

So @Doyensec recently published a report comparing @xbow and @AikidoSecurity, two AI pentest platforms. I figured, why not run @HacktronAI on the same test? So I ran a pentest on one of the target. Hacktron cost $350, while XBOW and Aikido cost $4,000 each. We did pretty well!

S1r1u5_'s tweet photo. So @Doyensec recently published a report comparing @xbow and @AikidoSecurity, two AI pentest platforms.

I figured, why not run @HacktronAI on the same test? So I ran a pentest on one of the target. Hacktron cost $350, while XBOW and Aikido cost $4,000 each. We did pretty well! https://t.co/BNiTCUPiLF

7

234

20

112

14K

EvanThomasLuke retweeted

Anthropic

@AnthropicAI

5 days ago

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx

2K

28K

5K

15K

18M

EvanThomasLuke retweeted

Anthropic

@AnthropicAI

5 days ago

Each time we release a model, we run the same test: give it code that trains a small AI model, ask the new model to speed it up. It takes a skilled human 4-8 hours to reach 4x faster. In May 2024, Claude Opus 4 averaged a ~3x speedup. This April, Mythos Preview achieved ~52x.

59

4K

237

592

968K

Evan Luke

@EvanThomasLuke

7 days ago

@deanwball it is both obviously

0

11

EvanThomasLuke retweeted

Corban Villa

@corban_villa

12 days ago

Agents are finding more vulnerabilities than ever. But it turns out there are gaps in existing vulnerability discovery. Over the past 90 days vs. a year ago, web vulnerabilities (XSS/SQLi/CSRF) are down 66% and memory safety exploitability is down 3.5x. We built the Agentic Vulnerability Coverage Map to track it all, updated daily. Introducing the Berkeley Vulnerability Initiative: https://t.co/qiZ4eThb0n. ⤵️

3

65

16

24

14K

EvanThomasLuke retweeted

cat

@_catwu

12 days ago

Excited to share our most powerful new Claude Code feature: dynamic workflows! Mention "workflow" in a prompt and Claude will dynamically create an orchestration plan that it strictly follows, allowing you to confidently trust that every stage happens in the right order even across 100s of agents.

_catwu's tweet photo. Excited to share our most powerful new Claude Code feature: dynamic workflows!

Mention "workflow" in a prompt and Claude will dynamically create an orchestration plan that it strictly follows, allowing you to confidently trust that every stage happens in the right order even across 100s of agents.

352

8K

823

6K

2M

EvanThomasLuke retweeted

Serena Ge (Datacurve)

@serenaa_ge

14 days ago

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

serenaa_ge's tweet photo. Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks.

On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work. https://t.co/HCDcjNuTFK

511

6K

743

3K

2M

EvanThomasLuke retweeted

Doyensec @Doyensec

13 days ago

Just released🚨#doyensec's latest whitepaper is a head-to-head comparison of the @AikidoSecurity & @XBOW #AI-powered penetration testing platforms. https://t.co/p3XVIxHKWq Read how each automates #security testing, their detection capabilities, workflows & usability.

Doyensec's tweet photo. Just released🚨#doyensec's latest whitepaper is a head-to-head comparison of the @AikidoSecurity & @XBOW #AI-powered penetration testing platforms.

https://t.co/p3XVIxHKWq

Read how each automates #security testing, their detection capabilities, workflows & usability. https://t.co/aUJk1idMz4

6

53

14

27

12K

Evan Luke

@EvanThomasLuke

17 days ago

@paoloanzn /goal in claude code and codex works very well for long running tasks I wish they had it in browser

0

46

Evan Luke

@EvanThomasLuke

17 days ago

@paoloanzn https://t.co/6paVa9vLDI and https://t.co/jYoYLhI63f

1

0

87

Evan Luke

@EvanThomasLuke

17 days ago

@paoloanzn I think the next gpt model will be on par or above this version of mythos for security. Btw you can get cyber approved to remove refusals if you're worried about getting your accounts banned for anthropic and openai.

1

2

0

78

Evan Luke

@EvanThomasLuke

17 days ago

@paoloanzn I haven't used mythos yet so I'm just relying on benchmark reporting and speaking to peers who have. See these https://t.co/QR42xgNCUG and https://t.co/llkx133z1g. Yeah I don't think any lab has a moat right now but the data from benchmarks is impressive.

EvanThomasLuke's tweet photo. @paoloanzn I haven't used mythos yet so I'm just relying on benchmark reporting and speaking to peers who have. See these https://t.co/QR42xgNCUG and https://t.co/llkx133z1g. Yeah I don't think any lab has a moat right now but the data from benchmarks is impressive. https://t.co/EHjtpA5vDq

1

0

62

EvanThomasLuke retweeted

Lisan al Gaib

@scaling01

17 days ago

I don't understand how people are still coping about Mythos. Here's a few benchmarks: SWE-bench Pro: Mythos -> 77.8%, GPT-5.5 -> 58.6% HLE: Mythos -> 56.8%, GPT-5.5 -> 41.4% UK AISI cyber ranges: - "The Last Ones": Mythos -> 6/10, GPT-5.5 3/10 - "Cooling Tower": Mythos -> 3/10, GPT-5.5 0/10 ExploitBench: - Mythos -> 18 Arbitrary Code Executions - GPT-5.5 -> 0 Arbitrary Code Executions ExploitGym: - Mythos -> 157 exploits (289.3 LLM calls) - GPT-5.5 -> 120 exploits (375.4 LLM calls) XBOW same story. Mythos has much higher odds of finding vulnerabilities within smaller token budgets.