David Naylor

Verified account

@_David_Naylor

prev: Detect+Respond. secops leader/nerd now: AGI-pilled, exploring how AI changes cyber

Joined October 2023

576 Following

52 Followers

177 Posts

Pinned Tweet

4 months ago

What does it look like when Claude, GPT, and Gemini try to hack each other in real time? BattleBench is a cybersecurity benchmark I built where AI coding agents battle in identical vulnerable containers. They scan networks, exploit opponents' vulnerabilities, and submit captured flags to a referee. The referee kills the loser's container. Last agent standing wins — all running simultaneously with no human intervention. What's live now: → ELO leaderboard across multiple scenarios → Full terminal replays of every agent's session https://t.co/G9iRb8ilio BattleBench has seen ~275 games played. Likely not enough to yield anything truly insightful yet but a few things already stand out. - gpt-5.2-codex is a beast. - smaller/earlier models are more likely to refuse to play - the agents are obviously faster than humans will be. I'm eager to see how the benchmark plays out over 1000+ games and how the latest gpt spark models compare with opus 4.6 fast. Go watch codex destroy its opponents (except Opus) and let me know if you have any feedback

1

12

3

6

786

about 21 hours ago

@AaronBergman18 Anyone relevant you would vouch for from Philly who is on twitter?

1

1

0

0

67

2 days ago

@0xBoku @HackingLZ @bcherny @AnthropicAI +1

0

1

0

0

327

5 days ago

@taviso @XavierRiveraX Can you please explain this take? Do you think it will be a failure or insufficient?

0

2

0

0

171

20 days ago

>January 2024 >OpenAI announces the gpt store. > see that GPT usage will pay out devs > sameday publish 9 gpts in the store. > 2 hit 10k+ chats > never hear about gpt monetization again

0

0

0

0

34

_David_Naylor retweeted

22 days ago

ask your ai: "based on what you know about me, what should I set my hourly rate on freelance projects? provide answer then reason"

0

1

1

0

36

22 days ago

ask your ai: "based on what you know about me, what should I set my hourly rate on freelance projects? provide answer then reason"

0

1

1

0

36

22 days ago

to claude/chatgpt: what should i set my hourly rate chatgpt: 250 claude: 250-400 ty for believing in me claude

_David_Naylor's tweet photo. to claude/chatgpt: what should i set my hourly rate

chatgpt: 250
claude: 250-400

ty for believing in me claude https://t.co/2ZpZnzud3G

1

0

0

0

50

24 days ago

This is partly crazy because some orgs have been monitoring for and then revoking tokens on behalf of compromised victims the past couple weeks - now that proactive community defense becomes even more destructive for the victim. savage

24 days ago

This malware deletes your full system as soon as you revoke the API keys it stole from you

0x_Osprey's tweet photo. This malware deletes your full system as soon as you revoke the API keys it stole from you https://t.co/nxqbM1cMlH

12

547

47

128

107K

0

0

0

0

39

26 days ago

Most important professional skill has gotta be story telling. Really doesn’t matter what your job is, if you are competent + a story teller you will fly high.

0

0

0

0

18

26 days ago

@BVeiseh https://t.co/ltCe37VSBH

0

1

0

0

16

26 days ago

@BVeiseh I wish agents could make high quality TRRs at scale. I have a little prototype of it but would love to see an ai security startup like mindfort do it for the masses. Would earn a ton of goodwill in the community if done right.

2

1

0

0

47

about 1 month ago

Happy cinco de mayo to those who partake

_David_Naylor's tweet photo. Happy cinco de mayo to those who partake https://t.co/OqMZfYIZfs

0

1

0

0

30

about 1 month ago

appreciate the post thanks for sharing. "Organizations are making security purchasing decisions based on a threat model that assumed attackers would not be able to study how their defensive products actually work." The above was probably always a bad idea with or without AI lol. the 'what defenders should do' section is a great set of action items for lots of orgs. although ps logs ime are quite expensive, surprised to see them characterized as cheap here. overall think the post does a good job helping defenders focus on the important things - thanks.

0

0

0

0

128

about 1 month ago

Company: Snap Cut: ~1,000 - 16% Evidence: CEO letter cited AI reduces repetitive work, increases velocity and enables smaller teams https://t.co/kuGOnpYjlU

about 2 months ago

Snap, parent company of Snapchat, is making a massive workforce reduction — eliminating 1,000 jobs, representing 16% of its current employees, in a move to accelerate net profitability. In addition, the company is closing 300 open roles. CEO Evan Spiegel believes “rapid advancements in artificial intelligence” will help smaller groups work better. https://t.co/3quj1GNDcO

2

21

5

5

11K

0

0

0

0

47

about 1 month ago

🧵for tracking layoffs attributed to AI

2

0

0

0

43

about 1 month ago

Company: Atlassian Cut: ~1,600 - 10% Evidence: CEO letter cited AI changing the role count and skill mix needed https://t.co/U0nNUZ5ciJ

1

0

0

0

41

about 1 month ago

databricks free tier is so generous. I'm able to etl all my agent telemetry lab data and run analytics/queries against it for $0.

0

0

0

0

47

about 1 month ago

And of course they are fun to hunt and detect with

0

0

0

0

22

about 1 month ago

Coding agent logs are much like powershell logs 2 me. You shouldn’t really need them to make factual claims about what happened but wow they are extremely helpful for why/how context when performing investigations. And just like ps logs, verbosity means most can’t afford it.

3

0

0

0

47

Last Seen Users on Sotwe

Trends for you

Most Popular Users