@udunadan They mentioned gpt-4o-mini was performing better than larger models.
A lot of overlap with the presentation and this blog: https://t.co/9RLma2yqRV
Attended an internal talk by Taesoo Kim (of team Atlanta AIxCC) and holy shit it was a breath of fresh air. An actual nuanced take about LLM vuln research from someone who has already done it as opposed to all the Claude code hype people here.
@udunadan 3. Issue with Claude Code style VR: AI finds and AI judges. Requires a lot of human capital to verify the code and see the proof. They had much better luck with AI having access to validation harness where it could actually test its hypothesis. Granted a lot of AIxCC is fuzzing.
@udunadan What stuck with me was:
1. Larger models do not automatically mean better. They had better answers from mini models that think less and do not hallucinate for specific tasks
2. The LLMs were a small part of AIxCC, the harness and validations are the major part. /1
I use models for static analysis everyday. You have to give them targeted prompts and use โtraditional sastโ to pinpoint the code to ~10KBs of hotspots to get good results. Granted a small code base in my scope is 100MBs of code so YMMV.
I am still long tree-sitter.
Introducing Claude Code Security, now in limited research preview.
It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss.
Learn more: https://t.co/n4SZ9EIklG
@HaifeiLi Haha yeah I am sure. I literally have to use traditional sast to reduce the code to a few KBs before LLMs become usable so I am still long Office and tree-sitter.
@HaifeiLi Haha. My grant price was $420 so I am not that much under water. I actually do not hold a lot of MSFT outside of recent vests after I saw SP500 is basically tech so I cashed out to buy a home
@moyix Text has always been dangerous. Instruction manuals and warnings on packaging. We haw now decided to talk to machines so we need the same guard rails.
@TR4NNYKISSER People lionize and dream of being "the only person who knows X and changes $$$/hr." In my industry (security), a lot of entry level steps like helpdesk have been effectively cannibalized or outsourced but are still expected by the old guard.
@TR4NNYKISSER There is an unfortunately not so small section of experienced engineers who think this is good and provides job security. Weโve had companies who claimed to only hire seniors, Netflix being the most prominent off the top of my head.
The industry has been going down this path.