๐จ Can we truly prove that LLMs won't leak your data or generate unsafe code?
Almost all LLM safety evals today only give estimates.
Sampling. Benchmarks. Leaderboards.
Zero guarantees โ
We present BEAVER, the first system to provably bound LLM constraint satisfaction.
๐งต๐
#ICLR2025 was a blast!
Grateful for the opportunity to present our work on formally certifying LLMs. For more details of our work, please check out our project page: https://t.co/ZN6s4yNLvU
Meet us at #ICLR2025 next week to hear about IterGen! We introduce grammar-guided navigation for LLM generation --- precise forward and backward control to improve semantic accuracy.
Happy to see support for our setup and methods in our LLM certification frameworks (https://t.co/OE1I25UJ2O, https://t.co/ZzOLBVbaKo) from @AnthropicAI.
๐ Excited to present our paper "Relational DNN Verification Leaps Forward With RABBit" at #NeurIPS2024 on December 11th!
Authored by @TarunSures41845 and @debangshuban18!
1/N
To address this, this paper introduces RABBit, a SOTA relational verifier that tracks dependencies between perturbed inputs under UAP attack, improving verified robustness accuracy by up to 6%.
N/N
Most DNN verifiers focus on input-specific adversarial attacks, but practical attacks often follow the UAP model, applying the same perturbation across inputs. This leads to overly pessimistic robustness guarantees.
6/N
#ILLINOIS computer science professor Gagandeep Singh and PhD student Isha Chaudhary have determined that large language models have biased output that can spread misinformation and widen social gaps between various demographic groups. Read more!
โถ๏ธhttps://t.co/g8MDbQtJkX