one of them has been disclosed with a CVE. more to come.
besides, whatโs google cooking? mythos? their own gemini variant? interesting times.
https://t.co/zBjQDGig2O
We're doing an experiment with open models @winfunction to see how far we can push them to find vulns in hardened targets. So far:
- $4.5K in bounties from Chrome VRP with a few more pending, with the scans costing less than $100.
- 2 CVEs in NGINX (CVE-2026-28755 & CVE-2026-42926). And watch out for the next release!
- And 60ca500faea0fc70816bb9c53af3815e2af3e6c962b4b4ea63c33c62ebb4240d ๐
We're writing a blog on this soon.
@jmedeiros1337@winfunction yes! we're currently running an experiment where we use "tiny" open models to find 0days in hardened software. will publish the blog soon! :)
https://t.co/tCDmgF2tdk
We're doing an experiment with open models @winfunction to see how far we can push them to find vulns in hardened targets. So far:
- $4.5K in bounties from Chrome VRP with a few more pending, with the scans costing less than $100.
- 2 CVEs in NGINX (CVE-2026-28755 & CVE-2026-42926). And watch out for the next release!
- And 60ca500faea0fc70816bb9c53af3815e2af3e6c962b4b4ea63c33c62ebb4240d ๐
We're writing a blog on this soon.
We discovered the same vulnerability too. :)
And @winfunction discovered 4 more remote RCE primitives in NGINX soon to be publicly disclosed.
Anywho, we're hiring security researchers with a knack on taming LLMs.
If you're interested in novel vulnerability research and autonomous exploitation with language models, DM me and I'll send you a fun CTF to solve. :)
Introducing nginx-poolslip, a fresh RCE for the the latest nginx release 1.31.0.
nginx-rift has been patched, but our security agent Vega has found a new 0 day.
We will release the full technical writeup with ASLR bypass 30 days after the patch on https://t.co/LAhOC5UHrp.
We're doing an experiment with open models @winfunction to see how far we can push them to find vulns in hardened targets. So far:
- $4.5K in bounties from Chrome VRP with a few more pending, with the scans costing less than $100.
- 2 CVEs in NGINX (CVE-2026-28755 & CVE-2026-42926). And watch out for the next release!
- And 60ca500faea0fc70816bb9c53af3815e2af3e6c962b4b4ea63c33c62ebb4240d ๐
We're writing a blog on this soon.
We're doing an experiment with open models @winfunction to see how far we can push them to find vulns in hardened targets. So far:
- $4.5K in bounties from Chrome VRP with a few more pending, with the scans costing less than $100.
- 2 CVEs in NGINX (CVE-2026-28755 & CVE-2026-42926). And watch out for the next release!
- And 60ca500faea0fc70816bb9c53af3815e2af3e6c962b4b4ea63c33c62ebb4240d ๐
We're writing a blog on this soon.
The software that runs in the veins of modern society is fragile, every proper Engineer knows that, C just makes it worse.
This affects 0.6.27 ... 1.30.0, so pretty much everything until yesterday.
I know some of you are still using affected versions so update to 1.31.0.
Love the Claudia reference as the first thing here.
We loved working on Claudia but couldn't balance working on security research projects and Claudia at once.
Fun fact, we invented "SKILLS" before it was even a thing. There was a feature in Claudia called "AGENTS" where users could share and install system prompts for specific tasks via their GitHub repos, just like the skill marketplaces concept in Claude now.
See here: https://t.co/86a0PTryFZ
And Anthropic did talk to us after the launch of Claudia but unfortunately I can't reveal more about it but damn was it some tough decision.
a16z @speedrun request for startups: GUIs for Agents
weโre still in the MS-DOS era of agents today - CLI, terminal sessions, file directories deleted by openclaw etc. while a small slice of silicon valley are power users, we're SO early for the rest of the world
at Speedrun, weโre looking for bold founders excited to bring the power of agents to normies everywhere. there's a whole slew of products to be built here - from agent builders to marketplaces to managed infrastructure
one broad idea weโre excited about are visual abstraction layers for agents. if you don't know exactly what you want, a command line / chat interface is paralyzing - you need to see options
1 example - think of a GUI or visual command center inspired by strategy games (ex. Factorio) where agents and workflows are represented graphically. skills, tools, MCP connections, background processes, etc could all be configured and shown visually in a workspace
on UX, strategy games have long perfected agent management. zoom to get a birds-eye view of your agents, batch and queue orders via shortcuts, assign agents in multiplayer etc. a well-designed agent command center would make multi-agent orchestration for normies feel easy & intuitive
most folks today still haven't moved beyond ChatGPT. the potential is enormous - just as Windows unlocked mass-market use of personal computers, the right visual abstraction layer could unlock agentic work for everyone - from individuals to enterprise teams
if you share our vision, we'd love to chat!
Love the Claudia reference as the first thing here.
We loved working on Claudia but couldn't balance working on security research projects and Claudia at once.
Fun fact, we invented "SKILLS" before it was even a thing. There was a feature in Claudia called "AGENTS" where users could share and install system prompts for specific tasks via their GitHub repos, just like the skill marketplaces concept in Claude now.
See here: https://t.co/86a0PTryFZ
And Anthropic did talk to us after the launch of Claudia but unfortunately I can't reveal more about it but damn was it some tough decision.
During our YC (@ycombinator S24) batch, we had the awesome opportunity to meet @paulg and talk about what we're building: An autonomous AI hacker.
To showcase a fun demo, I remember opening my laptop in the Uber to his home and challenging our agents to find vulnerabilities in the old HackerNews codebase written in Arc.
For those unfamiliar, Arc is a programming language designed by PG and Robert Morris. And the old HN codebase is written in Arc.
We only got to talk about it with him but we just redid the experiment with our improved harness for fun!
And we wrote a blog about it: https://t.co/IxVhtqDjSg
During our YC (@ycombinator S24) batch, we had the awesome opportunity to meet @paulg and talk about what we're building: An autonomous AI hacker.
To showcase a fun demo, I remember opening my laptop in the Uber to his home and challenging our agents to find vulnerabilities in the old HackerNews codebase written in Arc.
For those unfamiliar, Arc is a programming language designed by PG and Robert Morris. And the old HN codebase is written in Arc.
We only got to talk about it with him but we just redid the experiment with our improved harness for fun!
And we wrote a blog about it: https://t.co/IxVhtqDjSg
Everyone is talking about Mythos, but GPT-5.4 is actually shaping up to be a more capable model than people realize. This N-Day bench has GPT-5.4 at the top, followed by GLM-5.1 and interestingly beating Opus 4.6 so far. Crazy to think Spud is a bigger leap than this.
Vulnerability benchmarks rot. Cases leak into training data, scores measure memorization.
We built N-Day-Bench: tests LLMs on finding real vulnerabilities in real repos, refreshed monthly from live GitHub advisories. Blinded judging. All traces public.
Very interestingly, the latest model from @Zai_org, GLM 5.1 performs really well!
Link: https://t.co/K3foq0DfMt
seconding this. have seen "fix" commits for vulns with zero impact but in claude's words, a critical in all caps. with the same model being the triager now, it's out there with a CVE tag. some findings are at best "hardening", it ain't bad per se, best practice or whatever. a solid exploit poc should be conveying impact, these cvss scores don't cut it.