๐ Introducing ๐๐๐๐๐๐๐ญ๐๐ก! ๐
While generative models ๐พ๐ฅ like Sora and Veo 2 have shown us some stunning videos recently, they also make it easier to produce harmful content (sexual๐, violent๐ โโ๏ธ, deepfakes๐งโโ๏ธ).
๐ฅ ๐๐๐๐๐๐๐ญ๐๐ก is here to help ๐: the first MLLM-based video guardrail model designed to follow customized safety policies and provide guardrails with precise explanations in a zero-shot manner.
In addition, we also introduce SafeWatch-Bench๐, a 2M+ high-quality video guardrail dataset covering over 30 unsafe video scenarios from various real-world platforms and SOTA generative models to comprehensively cover all potential risks.
๐งWhy SafeWatch?
๐1. Strong policy-following: trained on diverse videos and policy taxonomies, yielding high generalizability to unseen scenarios and subtle policy definitions.
๐2. High Inference Speed: introducing two plug-and-play modules to process policies in parallel and prune irrelevant video tokens, reducing inference costs and eliminating positional bias.
๐3. In-depth explanations: trained on high-quality explanations from SafeWatch-Bench๐ labeled by a rigorous multi-agent consensus pipeline and verified by human experts.
We evaluate SafeWatch on a large variety of guardrail tasks:
1๏ธโฃ On both real-world and generative video subsets of SafeWatch-Bench, SafeWatch outperforms SOTAs, including GPT-4o, by 29.2% and 27.2% on average, while requiring much less inference time.
2๏ธโฃ On 5 existing video guardrail benchmarks, SafeWatch achieves 87.1% accuracy, consistently outperforming previous SOTAs.
3๏ธโฃ On 4 new video categories and unseen policy taxonomies, as well as 4 different prompting tasks, SafeWatch maintains high accuracy and outperforms GPT-4o (renowned for its zero-shot generalizability).
๐ฅ๐ฅ Our project has been released:
๐Paper link: https://t.co/GaVau8jwzj
๐Project page: https://t.co/lCyLzOOOpo
๐Code (coming soon): https://t.co/N23B4BuRa7
Iโll be at #NeurIPS2024 from now to Sunday. DM here or on Whova to have a chat about (multimodal) large language models privacy, memorisation, training strategies using synthetic data, agents, judges, distribution shift robustness, hallucinations and uncertainty estimation.
Concerned your LLMs ๐ค may regurgitate copyrighted contents ยฉ๏ธ and get you sued? ๐ฉธ๐ธ
Fix it with model fusion ๐ซ
Result of a fantastic collaboration with @JavierAbadM@DonhauserKonst@FannyYangETH ๐จ๐ญ๐ฌ๐ง
(1/5) LLMs risk memorizing and regurgitating training data, raising copyright concerns. Our new work introduces CP-Fuse, a strategy to fuse LLMs trained on disjoint sets of protected material. The goal? Preventing unintended regurgitation ๐งต
Paper: https://t.co/OjUAlg2b65
AI coding assistants (e.g. @cursor_ai, @codeiumdev , @github Copilot) are transforming software developmentโbut how secure are they?
Our new blog post reveals which tools stand up to security best practices, which introduce hidden vulnerabilities, and what you can do to safeguard your code. Learn more: https://t.co/LEfsnKqgUA
#ai #coding #copilot #security #safety
Can't wait for our workshop 'Interpretable AI: Past, Present and Future' @NeurIPSConf !
Check out our super interesting program with talks from @NeelNanda5 , @CynthiaRudin , #RichCaruana , @jxzhangjhu and @TongWang!
We'll have a panel moderated by the amazing @kamalikac !
Help us spread the word, RTs appreciated!
๐งต [3/3] Special thanks to all coauthors: Adam Davies, Ashkan Khakzar, Anjun Hu, Arshia Hemmat, Jianhao Yuan, Tom Lamb, Jiyang Guan, Philip Torr. Work done at @OxfordTVG
๐งต [2/3]
- ๐ฅ Is DP In-Context Learning really making any progress? At the @solarneurips workshop we present a very preliminary draft that questions its progress in several settings.
Paper: https://t.co/QzAxTyqfUa
(3/3) ๐ก๏ธ Instructing the model not to respond if the image has been manipulated reduces the chances the attacker can extract PIIs without degrading its accuracy.
Thanks to all co-authors: Nathalie, @FlorianTramer, @OxfordTVG, @fedassa
(1/3)๐ฅMulti-Modal LLMs (MLLMs) can respond to questions about document scans. How safe are they? Come at Hall C #2300 1.30pm to find out!
๐ง Attackers may successfully query MLLMs to extract Personally Identifying Information! ๐จ
https://t.co/UbUG973O3d
(2/3)๐ These models may regurgitate names, addresses, card numbers, ids.
๐งโ๐ฌ We find high input training resolution and stronger pre-training can significantly reduce the chances of regurgitation.
3/3๐ค Simple prompting and editing outperform traditional augmentations, producing more robust models with fewer augmented samples.
๐ Given the quality of generative models, filtering is no longer required to attain improved performance.
@OxfordTVG@DYDYYDYYYD@adamdaviesnlp
1/3,๐งช๐ค What's the best way to improve model robustness to distribution shift using synthetic data? ๐ช Come to Hall C 4-9 #912 #ICML2024 to find out!
๐ฅClassifiers fail to recognise objects observed in previously unseen settings.
๐งช Can #StableDiffusion be used to fix this?
2/3,โจ๏ธ Prompting Text-to-Image generators proves to be an extremely effective (SOTA) and interpretable approach to synthesize interventional data for augmentation.
๐We extensively study the impact on robustness of conditioning mechanisms, prompting strategies and filtering.
[1/2] Excited to be presenting 3 papers on Responsible AI #ICML2024!
"Extracting Training Data from Document Based Visual Question Answering Models"
https://t.co/UbUG973O3d
โNJPP: Toward Interventional Data Augmentation Using Text-to-Image Generatorsโ
https://t.co/jPqaFPH1KT
[2/2]
"Strong Copyright Protection for Language Models via Adaptive Model Fusion"
https://t.co/dyW6wPhCJn
GenLaw and Foundation Models in the Wild workshops
๐ Let's grab a coffee and chat about uncertainty, privacy, memorization, robustness, synthetic data, multimodal agents
1/n Happy to share our recent work with @rvolpis@puneetdokania Philip Torr and Grรฉgory Rogez ๐๐ค:
Placing Objects in Context via Inpainting for Out-of-distribution Segmentation ๐๏ธ๐จ ->๐๐๐๐บ๐๐
Paper: https://t.co/viqm2YIZEu
Code: https://t.co/XpX2ysm6Da
In the era of long-context LLMs it is not enough to make models โforgetโ unsafe knowledge. Adversaries can use this long context to โun-unlearnโ the malicious behavior ๐ฟ
๐ฅ Excited to be co-organizing this #ECCV2024 workshop with an outstanding line-up of speakers! ๐ฃ๏ธ
๐Submit if you got papers with new benchmarks and analyses inspecting Emergent Visual abilities โ๏ธ or limitations โof Foundation Models! ๐ค
๐ฅ #ECCV2024 Showcase your research on the Analysis and Evaluation of emerging VISUAL abilities and limits of foundation models ๐๐ค๐๏ธ at the EVAL-FoMo workshop ๐ง ๐โจ
๐ https://t.co/LYM3IFejUy
@phillip_isola@sainingxie @chrirupp @OxfordTVG@berkeley_ai@MIT_CSAIL