What if we could mathematically prove that code does what it's supposed to do, not just test it and hope?
The Caltech AI Alignment Group hosted @ClarkBarrett7 from @Stanford for a talk on CSLib, a platform for AI-assisted formal verification in Lean, and why proving code correct is becoming one of the most urgent problems in AI safety.
1/7
The CSLib steering committee recently announced the official launch of CSLib — an open-source effort to formalize computer science in Lean, inspired by the impact of Mathlib in mathematics.
CS researchers, practitioners, and enthusiasts are invited to get involved to support formalizing essential computer science concepts, and building infrastructure for reasoning about real-world code with Lean.
Learn more at:
🌐 https://t.co/Qdj1XzikL3
📄 White paper: https://t.co/ZQHAKyMYCP
🤝 Contribute: https://t.co/HfDP19XwZ9
#LeanLang #LeanProver #CSLib #OpenSource #FormalVerification
🥁And the #cav24 Award goes to...🥁
Clark Barrett @Stanford, David Dill @Stanford, Kyle Julian @Wing, Guy Katz @CseHuji and Mykel Kochenderfer @aiprof_mykel@Stanford for their #cav17 paper “Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks”
Congratulations! 👏
How can we train a language model to communicate with other agents? We propose informativeness as a training objective, where a sender's message is informative insofar as it increases the receiver's log probabilities over future observations conditional on the message. (1/8)
Are you ready for @eulerfinance's ✨$1.25M✨ audit competition on @cantinaxyz?
We're thrilled to announce that $100k of the total pot is being allocated to formal verification managed by @certora 🔥
@VitalikButerin You might be interested in this work from our lab in which we propose an approach that combines LLMs and formal methods to generate and formally verify code automatically: https://t.co/MSwJXmg3sP
We're looking to get to know the users of SMT solvers! Please DM us if you use any SMT solver, and especially if you use cvc5. Reposts for visibility are also appreciated!
Just participated in a fun podcast hosted by @joe__scott__ in which we discuss automated reasoning and my research. Check it out: https://t.co/3bvWqvbNZG
@BRIAN_____@testsmtsolvers After SAT, you can ask for a model, then plug the model into the original formula, and it should reduce to true. If it does, then you can trust the result. If you want to automate this, you can assert the model together with the original formula and send to another solver.
@BRIAN_____@ciphernyx@zhendongsu@testsmtsolvers Well, a healthy skepticism is always good. I suggest that you always check the models you get back from a SAT result. And for UNSAT results, it's good to have more than one solver confirm it. If you do that, I think the chances are very slim that you would miss a bug.
@testsmtsolvers Also, though this is far from obvious because the QF_S logic was only recently standardized and is not yet published on the website, equality between regular expressions is not allowed by the standard - only membership constraints are allowed.
@testsmtsolvers Refutational soundness bugs are much more serious, and CVC4 has an ongoing effort to produce independently-checkable proof objects to address this issue.