We discovered the same vulnerability too. :)
And @winfunction discovered 4 more remote RCE primitives in NGINX soon to be publicly disclosed.
Anywho, we're hiring security researchers with a knack on taming LLMs.
If you're interested in novel vulnerability research and autonomous exploitation with language models, DM me and I'll send you a fun CTF to solve. :)
We're doing an experiment with open models @winfunction to see how far we can push them to find vulns in hardened targets. So far:
- $4.5K in bounties from Chrome VRP with a few more pending, with the scans costing less than $100.
- 2 CVEs in NGINX (CVE-2026-28755 & CVE-2026-42926). And watch out for the next release!
- And 60ca500faea0fc70816bb9c53af3815e2af3e6c962b4b4ea63c33c62ebb4240d ๐
We're writing a blog on this soon.
During our YC (@ycombinator S24) batch, we had the awesome opportunity to meet @paulg and talk about what we're building: An autonomous AI hacker.
To showcase a fun demo, I remember opening my laptop in the Uber to his home and challenging our agents to find vulnerabilities in the old HackerNews codebase written in Arc.
For those unfamiliar, Arc is a programming language designed by PG and Robert Morris. And the old HN codebase is written in Arc.
We only got to talk about it with him but we just redid the experiment with our improved harness for fun!
And we wrote a blog about it: https://t.co/IxVhtqDjSg
Vulnerability benchmarks rot. Cases leak into training data, scores measure memorization.
We built N-Day-Bench: tests LLMs on finding real vulnerabilities in real repos, refreshed monthly from live GitHub advisories. Blinded judging. All traces public.
Very interestingly, the latest model from @Zai_org, GLM 5.1 performs really well!
Link: https://t.co/K3foq0DfMt
Vulnerability benchmarks rot. Cases leak into training data, scores measure memorization.
We built N-Day-Bench: tests LLMs on finding real vulnerabilities in real repos, refreshed monthly from live GitHub advisories. Blinded judging. All traces public.
Very interestingly, the latest model from @Zai_org, GLM 5.1 performs really well!
Link: https://t.co/K3foq0DfMt
Currently testing GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, GLM-5.1, and Kimi K2.5.
Every run publishes the full audit trail โ shell commands, judge rationale, curator answer key, sandbox history. If a score looks wrong, you can trace it to a specific shell session on a specific line of code.
Results: https://t.co/JGMQZGhajy
How it works: each month the benchmark pulls fresh cases from GitHub security advisories, checks out the repo at the last commit before the patch, and drops models into a sandboxed read-only shell (h/t just-bash by @cramforce).
The model never sees the fix. It starts from sink hints and has to trace the bug through actual code.
Only repos with 10k+ stars qualify. A diversity pass prevents any single repo from dominating the set. Ambiguous advisories (merge commits, multi-repo references, unresolvable refs) are dropped.
Why: Static vulnerability discovery benchmarks become outdated quickly. Cases leak into training data, and scores start measuring memorization. The monthly refresh keeps the test set ahead of contamination โ or at least makes the contamination window honest.
New CVE in NGINX - CVE-2026-28755
NGINX stream module allows TLS handshake to succeed with revoked client certificates when ssl_ocsp on is configured.
This vulnerability was autonomously discovered by Winfunc's AI agent.
Read the write-up here: https://t.co/qiS50Lqgj9
New CVE in NGINX - CVE-2026-28755
NGINX stream module allows TLS handshake to succeed with revoked client certificates when ssl_ocsp on is configured.
This vulnerability was autonomously discovered by Winfunc's AI agent.
Read the write-up here: https://t.co/qiS50Lqgj9
The Recent CVEs in React and Node.js Were Found by an AI - https://t.co/8JgMMqFICc
In December 2025 and January 2026, an AI system autonomously discovered zero-day vulnerabilities in Node.js and React, two of the most widely deployed JavaScript runtimes and frameworks in the world.
This post documents how these vulnerabilities were found, the technical details of the flaws, and what this means for the future of security research.
New blog post: The Recent 0-Days in Node.js and React Were Found by an AI
Covering the discovery of 0-days with AI, its implications, and "AI slop". Have a read.
https://t.co/jAL6rGGTDx
A new vulnerability in React Server Components (CVE-2026-23864) was disclosed today.
One of the DoS vectors was discovered by me with the help of an AI agent @winfunction.
Other vectors were also discovered by @ryotkak et al.
All users should upgrade to a patched version as soon as possible.
https://t.co/mFdceNi63H
A new vulnerability in React Server Components (CVE-2026-23864) was disclosed today.
One of the DoS vectors was discovered by me with the help of an AI agent @winfunction.
Other vectors were also discovered by @ryotkak et al.
All users should upgrade to a patched version as soon as possible.
https://t.co/mFdceNi63H
๐จ CVE-2026-21636 in Node.js (@nodejs)
Node.js permission model bypass via unchecked Unix Domain Socket connections (UDS)
This vulnerability was autonomously discovered by https://t.co/Ym7gcZXFen, an AI agent that can find, exploit, and patch security vulnerabilities in codebases.
Thanks to @_rafaelgss for triaging and fixing the issue.
๐จ CVE-2026-21636 in Node.js (@nodejs)
Node.js permission model bypass via unchecked Unix Domain Socket connections (UDS)
This vulnerability was autonomously discovered by https://t.co/Ym7gcZXFen, an AI agent that can find, exploit, and patch security vulnerabilities in codebases.
Thanks to @_rafaelgss for triaging and fixing the issue.