Can we ensure AI agents respect our safety constraints, even as they explore & improve?
- Medical LLMs that are helpful, & avoid false claims?
- Bioscience agents that generate effective molecule designs, & ensure they’re safe?
📄🧵w/ @samuel_stanton_@clara_fannjiang@jiwoncpark@kchonyc@anqi_liu33@suchisaria
Excited to share “Conformal Policy Control” ⬇️
1/12
TL;DR: poster today at 3:15pm, P3-#1109!
Have you ever benchmarked your method on QM9, MD17, OC20, or ModelNet? It turns out that the 3D orientations of point clouds in commonly used datasets are highly non-random. How did we prove this and why should you care? 🧵
I took ~7 years off during undergrad. Worked at Starbucks, the postal service, a diner. Wasn't until making friends with some CS PhD students at UW Madison, who suggested sitting in on Eric Bach's class on the physics of computation, that I decided to go back (and then get a PhD)
You have a safe model you've tested, and you have a new post-trained model. How far can you trust a new model before it becomes unsafe? The answer goes to the heart of statistical decision-making. To be safe, the agent must be self-aware. Read @DrewPrinster's thread for more.
Can we ensure AI agents respect our safety constraints, even as they explore & improve?
- Medical LLMs that are helpful, & avoid false claims?
- Bioscience agents that generate effective molecule designs, & ensure they’re safe?
📄🧵w/ @samuel_stanton_@clara_fannjiang@jiwoncpark@kchonyc@anqi_liu33@suchisaria
Excited to share “Conformal Policy Control” ⬇️
1/12
Safety and exploration, in this view, can complement rather than oppose each other. With the right balance they can encourage progress while avoiding pitfalls. This balance, however, must be determined from what we actually know, not from what we hope. It can be enough, it turns out, to know what is safe, and to know what we want to try next.
11/12