“On this benchmark, GPT-5.5 delivers the best performance we’ve seen to date.
For context, GPT-5 missed 40% of vulnerabilities. Opus 4.6 reduced that to 18%. GPT-5.5 brings it down further to just 10%.”
@thewunderalbert analysis of OpenAI new model
https://t.co/m2ZqbuBlqM
"Hide the tools" never worked in security. AI won't be the exception.
Anthropic locked Mythos inside 40 companies. OpenAI just released GPT-5.5, same capability, open to everyone. KYC + guardrails instead of invitations.
We have early access to the model, Check out blog post: https://t.co/FJCI1AnSJR
@aakashgupta@aakashgupta this doe’s you an injustice. Lower the output and increase the quality. This AI written slop full of inaccuracies is not doing you any favors.
@juliendsv Writing code was never the bottleneck. Knowing what to write was. If AI handles the typing, the skill that matters is being precise about requirements. Which is basically what good PMs have been doing all along.
@PrajwalTomar_ The bug-catching is underrated. Half my Claude sessions now are 'review what you just wrote and tell me what breaks at scale.' Having a model that does that unprompted changes the feedback loop completely.
@OluwapelumiDad5 The default PM instinct is to build when you should explain. Half the feature requests I've dug into turned out to be documentation problems or onboarding gaps. Cheaper fix, faster win, but less satisfying to ship.
@lasthurdleweb The hardcoded API keys are wild but the missing analytics is worse. You can rotate a key. You can't recover six months of flying blind because you never instrumented anything.
@richiekastl Exactly. The frontend is the easy part to see, so it feels like progress. But nobody churns because your buttons are ugly. They churn because you built the wrong thing, or the right thing in the wrong order, or you never figured out what success looks like for them.
What gets me about this week's AI agent security panic: I run one of these 'nightmare' setups. But my agent doesn't touch my emails, my files, my network. It lives on its own hardened VPS. Own identity, own email, own rules.
I treat it like a new hire. That means UA training, security tooling to reduce attack surface, vuln scanning, DLP. When (not if) something goes wrong, we have blast radius containment, approval gates, limited scope.
Most people just... connect it like Slack, hand it the keys, then act surprised when something breaks.
What bugs me about the 10x conversation: everyone's focused on builders but the same multiplier works for attackers. Social engineering used to need a skilled operator, hours of research, one target. Now it's an API call and thousands of personalised attempts for basically nothing. Building got cheaper. Attacking got cheaper at the same rate. Defence budgets still assume the old math.
Every PM tool launched this year automates the same thing: typing.
Requirements? Auto-generated. Tickets? Written by AI. Roadmaps? One click.
I've shipped four products using Claude as my entire engineering team. The typing was never the bottleneck.
The bottleneck is the customer call that rewrites your roadmap. The usage data that kills the feature everyone wants. The meeting where you talk leadership out of building what the board asked for.
The tools keep getting faster at the easy part. Nobody's automating judgment.
Agree the title will evolve but "product builder" papers over the hard part. I ship with AI daily. Building got 10x faster. Figuring out what deserves to be built didn't. Someone still needs to sit with customers, read the data, and kill the other 9 ideas. That job doesn't collapse into a coding sprint.
Most people give their AI agent access by letting it be them. Your email, your Slack, your name. Convenient until someone sends it a well-crafted email and your agent does exactly what it's told.
I gave mine its own identity. Own email, own name, reports to me. If someone compromises it, they've compromised an employee, not me. The blast radius stays contained.
That separation forced me to think about the inbox as an attack surface. The moment I gave it one, I realised anyone in the world could send it a prompt injection disguised as a normal message.
Wrote up what I found https://t.co/unNzyjcwHd
@oliviscusAI I use Claude Code as a PM daily. PRD to tickets to code was already the fast part of the job. Figuring out what belongs in the PRD is where the actual time goes.
The threat is real and the multi-language detection is a useful layer. My concern is the framing. "30 seconds to fix it" implies the problem is solved, when pattern matching (however sophisticated) can only catch known attack shapes. Novel injections will bypass any pattern library.
The defenses that prevent actual damage are architectural: approval flows before outbound actions, treating external input as untrusted by default, restricting what the agent can do without human confirmation. I built email into my own agent this week and spent more time on those controls than on the IMAP integration.
Worth installing as one layer. Risky to treat as THE layer.
@a16z I'm a PM who ships with Claude Code every day, so I'm living in this 'standoff' (not sure I agree with that analogy"). AI makes everyone faster at building. It doesn't make anyone better at figuring out what to build. I still spend most of my time on what and why, not how.
@emanueledpt Ship something small every week. Not because it fixes imposter syndrome, but because after enough shipped projects, the voice in your head has less ammunition. You stop asking "am I good enough?" when the answer is sitting live in production.