This is a watershed moment.
GLM-5.2 solidly beat Opus 4.8 and human participants in our backend take-home, making the whole thing obsolete.
It also pushed forward the state-of-the-art for multi-stage media-to-transcript, with a new release: offmute-v2.
I come with receipts.
TeamPCP just did an interview where they were asked what defenders should do to stop supply chain attacks.
Their advice: pin versions to a specific hash, use least-privilege tokens, restrict IDE extensions. And then, verbatim: "The company Socket will detect the malware before the package even reaches your machine."
So... thanks, I think?
We're not putting this on the testimonials page.
But at the same time, if you're not yet using @SocketSecurity to protect your supply chain, what are you waiting for?
We are investigating unauthorized access to GitHub’s internal repositories. While we currently have no evidence of impact to customer information stored outside of GitHub’s internal repositories (such as our customers’ enterprises, organizations, and repositories), we are closely monitoring our infrastructure for follow-on activity.
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
@Dinosn It’s a big problem.
People are creating bug reports with AI, then are passing the work of deciding whether their report means anything onto the humans on the list.
The humans are being DDoSed.
@cryps1s Trying to pop you guys on BBP has been a nightmare haha, you fix my vulns within an hour of me finding them. Next level. I need to write something up about it, I’m not seeing this anywhere else.
We were one of four initial grant recipients in @OpenAI's Trusted Access for Cyber program.
Daybreak matters because frontier models now find bugs faster than maintainers can triage them, and that gap is about to get worse.
Next-gen models can bury open-source maintainers in reports. While working with frontier labs this year, we have seen the bottleneck shift. Bug finding is easy, but triaging, disclosing, and fixing them takes disproportionate time and effort. Each finding still needs a human to confirm the bug, a static or dynamic check to reproduce it, a working proof-of-concept, and a minimal patch. That work is heavy, and right now it falls on the maintainer.
On the OSS engagements we ran this year, we prioritized minimizing maintainer workload and keeping noise out of their inboxes. Every report we sent included a PoC, a fix patch, and a regression test. Anything that did not clear that bar did not get sent.
Commonly used software has never been short of bugs. Cyber-tier models will surface them at machine speed with little human effort, and the volume will overwhelm OSS projects without clear processes for disclosure, triage, and remediation. If you maintain an OSS project, do four things:
1. Publish a SECURITY.md. If you already have one, verify the reporting flow still works end to end.
2. Set a high bar for submissions. Require a PoC, a fix patch, and a regression test wherever possible.
3. Build validation harnesses that quickly answer three questions: is the bug real, does the fix work, and does anything else break?
4. Sandbox those harnesses. Malicious reports are a credible threat once the cost of generating them drops to near zero.
Bug finding is getting faster. Triage, verification, disclosure, and patching have to catch up.
This paper confirms what we mostly knew anyway that phishing tests don't improve much, if anything.
Best to focus on technical controls that mitigate the risks more directly.
https://t.co/fSK9ib5nLR
This is crazy. The hacker installed a dead-man's switch that will wipe your computer if you revoke the GitHub token they stole from you. Revoking the token is what triggers the wipe.
OpenAI is launching Daybreak, our effort to accelerate cyber defense and continuously secure software.
AI is already good and about to get super good at cybersecurity; we'd like to start working with as many companies as possible now to help them continuously secure themselves.
Our security bug bounty program is now public on HackerOne.
We've run the program privately within the security research community, and their findings have strengthened our products. Now anyone can report vulnerabilities and get rewarded.
Read more: https://t.co/li1QvSTCMs
I was just telling @xssdoctor this: We are living in a cyberpunk future.
- Our cars drive us around (we both have fsd teslas).
- We use AI agents to find bugs.
- We sometimes get paid in virtual currency.
- We use that to buy more AI agents to find us more bugs.
ENJOY IT MORE