Been playing around with Codex for the past week. Having explored Cursor, Codex, and Claude code, I have fairly mixed reviews. I thoroughly enjoy them, but I'm honestly terrified from a security point of view.
From a security point of view, they will create vulnerabilities and bugs that should just not exist in modern software. The type of bugs and vulnerabilities that when we hear "Cisco had hardcoded credentials in a recent version of IOS" we think "well shit, how the fuck is that still possible in the 2020s".
There is no question to that. They are doing it currently. The onus is on the user to understand what the code is, have the required background to interject and prompt it to review things within a specific security context, and to ultimately assume the risk.
We already have a plaque of vulnerabilities created by human code where people know better, and this is only going to add to that. It's incredibly important that we build out workflows that assist developers using LLMs in rapidly identifying, modeling, and mitigating these risks.
We can't just say "Security is everyone's responsibility" or "Well, if they don't know what they're doing, they shouldn't touch it".
We should be enabling. Enabling them to do it securely.
If you're in security, I implore you to explore these and build out a project. Start from how a normal user with minimal security background might interact with them. Don't prompt them to do things securely.
Review the code you receive, then put your security hat on. Prompt it for threat models, specific CWEs, mitigation plans, SBOM, and more.
Then figure out how to operationalize that in your org. We can't put the genie back in the bottle, but we can operationalize workflows in the security context that mitigate much of the risk, and leverage existing workflows to verify the work.
What they do well:
* Rapid prototyping, it's incredibly simple to get a working and usable prototype. This is what they excel at
* When used in context by someone with a security background, they are fairly good at threat modeling, and useful for things like SBOM and inventorying attack surfaces.
* Bridging gaps; allows those with coding backgrounds who do have experience writing their own code, but who don't have the time much anymore, to still explore their projects and ideas.
What they still suck at:
* Implementing secure coding practices from the start, without explicit intervention.
* Adhering to modern standards on just about anything. Tell it to do anything cryptographic, and without explicit intervention, it will always choose the least secure method.
* Long running sessions and context; the longer you go on without creating a new session with clean context, the less output you'll receive, and you'll also start running into circular patterns where the model keeps generating the same code that produces a crash over and over again.
What the really really suck at:
* With Codex (ChatGPT Plus), I had to switch to API based usage within my first 2 days after I exceeded the limits after 2 4 hours sessions. This is a tough one. It costs money to run these, but at the same time, if I just want Codex access, the API is expensive. In the first day I spend $15.68 on API usage alone. This can be cost prohibitive and add up incredibly quickly. This is with what I'd consider "moderate" usage, where I'm exploring it, and not in a real workflow where I'd be producing production grade software.
* With Cursor, the variety of models is fantastic. However, you will get completely different results and feels just by switching models. Compose feels way different than Sonet, which feels way different than Grok Code.
Specific examples of bad practices:
* When asking it to implement an "irreversible" secret storage method (I explicitly mentioned it was for passwords and API keys), one provided for a method using MD5, the other provided a method using SHA256.
* When asking Codex to create backend APIs, they default to allowing all CORS origins. Obviously not the most pressing issue (trust me...there are far more serious things it does wrong), it's a simple thing that almost assuredly would go unchanged by an unseasoned developer pushing something to prod.
* When asking it to implement an MFA solution, it chose to use pyotp, defaulting to SHA1 (RFC6238), which in the context of near-future compute, is vulnerable to replay and collisions (though to date, notable that nothing real exists on this...yet)
* When designing backend API calls for audit logging, it chose to just allow anyone with a valid session token to make calls to this API, which could allow someone to forge audit logs or overload them and make them useless. It also failed to create a signature validation method for audit logs, both requiring explicit intervention to fix.
* When designing an RBAC mechanism, it will almost always revert to allowing less privileges roles to access things that they shouldn't, such as reading API secrets. Explicit intervention is required to prevent these from sprawling
* Using localStorage for session tokens, which is honestly not worse than human created apps, but makes it vulnerable to any XSS explot or browser extension stealing and reusing the token.
* Pinning dependencies. By default, pins dependencies. Usually choosing older, more vulnerable versions. Interesting, it choosing 3 year old versions in some instances, but in others allows the latest minor release.
* When building the initial docker image, it did not create a user with less privileges, instead allowing the app to run as root within the container.
@MostlyPeaceful courts doing court things. theyll shrug their shoulders when we finally have a real war and all the trans troops are stuck back stateside because theyre non-deployable.
@Metalpanthers incredible mental gymnastics, he signed with our AHL affiliate, who believe it or not, can make their own decisions independent of the NHL team.
@GrantGHurst literally we had presidents who were almost assassinated but beat the would be assassin to a pulp on the steps of congress. so many would love this.
we also had presidents throw the equivalent of keggers open to the entire public. partying at the whitehouse is tradition.
@PantherPourri Don't see the impairment here. Dobes is able to move his hand after the contact to get better positioning. Kind of negates the claim.
Now read the sentence directly before it, and the one where it falls under on-ice judgement. Answers the question.
@nhl_review For real, I don't know why they're crying about this call. There were others to harp on. This was the right call. Can't punch a dude in the back of the head when hes laying defenseless on the ice, even if he ran into your goalie.
@_AsiwajuLerry because it was always going to explode, this was a soft touchdown proving the new engines work, the new heat tiles, etc. it also proved they can land accurately down an engine. not perfect, but worth celebrating.