I spent the last weeks building LLM benchmarks for a very specific reason:
We want to use AI in RuneAI to help with THOR finding triage, and I needed a better baseline for model selection than generic LLM leaderboards.
Security-event triage is its own thing.
A model can be great at coding, reasoning or vulnerability writeups and still be a bad fit for deciding whether a messy endpoint finding should be suppressed, reviewed or escalated.
In real deployments this will likely happen inside agentic workflows with tools, memory, context handling and feedback loops. But before testing the whole system, I wanted a clean baseline:
How does the model behave when it only gets the enriched finding itself?
Blog post with the reasoning and methodology:
https://t.co/KQPOPDWP1B
Interactive benchmark results:
https://t.co/pvVhTBJsz0
Repo:
https://t.co/Fw3uW9nu2a
Maybe useful for others building SOC / security-event triage benchmarks.
interestingly, kernel anti-cheats share many similarities to EDR.
the conclusion is that the ultimate cheat is an vision based llm controlling the inputs, which is similar to web-scraping.
thanks for the writeup and insights @s4dbrd!
https://t.co/wDZ7zPiPT5
Every JWT writeup online covers 2–3 attacks and stops.
I got tired of jumping between 40 blog posts, so I wrote the whole thing. All in one place.
https://t.co/iCSzQ4GjcS
#infosec#appsec#bugbounty#websec#jwt
GitHub - zeroc00I/LLM-anonymization: Reverse proxy for Claude Code that anonymizes sensitive pentest data (IPs, hashes, credentials, hostnames, PII) before it reaches Anthropic. Dual-layer detection: local Ollama LLM + regex safety net, with per-engageme https://t.co/mSZFqyZryy
Good morning! Just published a blog post exploiting a VMware Guest To Host. A UaF Heap Feng Shui base address leakage to bypass ASLR and a stack-based buffer overflow to achieve RCE.
https://t.co/tCARJAKrEx
We see our home planet as a whole, lit up in spectacular blues and browns. A green aurora even lights up the atmosphere. That's us, together, watching as our astronauts make their journey to the Moon.
When a former CIA case officer and station chief talks tradecraft, it’s worth paying attention. The basics haven’t changed, it’s still about trust, access, and patience, just adapted to a new world. #CIA#HUMINT#Spycraft
https://t.co/Cuzhkztn4R