8/ What might actually help: a "contract graph verifier" that checks whether what one component promises is what the next one expects. Not a proof. A consistency check. Applied systematically across the whole codebase.
Would have caught 2 of my 4 bugs.
Full analysis with data, methodology, and the contract graph proposal: https://t.co/GhG0z4uAxf
1/ I formally verified AI-generated code in production. Here's what actually happened.
I built a Claude Code plugin called Crosscheck that uses Dafny (backed by Z3) to verify AI-generated code. Tested it on real energy-splitting logic in Django. Sessions spanning month boundaries need energy allocated correctly.
7/ The provers work. That's not the question. The question is whether "formal verification goes mainstream" means verifying ~25% of the codebase, which was already the most reliable part.
The integration layer, where components meet and assumptions are implicit, is where the real bugs live. No prover reaches it.
Over half a billion dollars is now flowing into formal verification startups (Axiom $200M, Harmonic $295M). The thesis: AI writes code, math proves it works.
I built a Claude Code plugin that uses Dafny to verify AI-generated code. Tested it on real Django production logic.
The verification worked. But all 4 bugs were in the parts you can't verify.
Link in first comment.
LangChain 🤝 AIPlugins
A first open source attempt at using AIPlugins (the same ones ChatGPT is using)
s/o @vaibhavk97 for this. Excited to see what other techniques the @langchain community comes up with - it's only the beginning
Docs (Python and JS) in 🧵
🇺🇸 oncyber is now available in english, korean, and japanese. packaged in a new visitor experience 🫡
🇰🇷 oncyber는 이제 영어, 한국어, 일본어 다 환영합니다.
🇯🇵 oncyber は英語、韓国語、日本語でご利用できるようになりました。