ChatGPT can now browse the internet to provide you with current and authoritative information, complete with direct links to sources. It is no longer limited to data before September 2021.
🔥 Just Released: Free guide to 💪 training LLMs, including techniques for parallelization, tokenization strategies and their tradeoffs, plus how much data you'll actually need 🤯
This is scary. 😱
The MOTHER of all LLM Jailbreaks & Prompt injections.
"Universal and Transferable Adversarial Attacks on Aligned Language Models" 🌐🔒
--- TL;DR ---
This research & code introduces a fascinating method called "Universal and Transferable Adversarial Attacks on Aligned Language Models," which automatically generates potentially infinite suffixes for any prompt to cause aligned language models to produce objectionable behaviors. 🤖🚨
--- Background ---
Previous attempts at jailbreaking language models have relied on manual crafting, which could be easily patched by vendors. In contrast, this method presents an automated approach called GCG that constructs an endless array of jailbreaks with high reliability, even for novel instructions and models. This makes it unfeasible for manual patching to address the vulnerabilities. 🛡️💻
--- The Method ---
1. Initial affirmative responses: To induce objectionable behavior, the attack targets the model to provide a positive response to harmful queries, initiating with "Sure, here is (content of the query)." This switches the model into a mode where it generates objectionable content immediately after.
2. Combined greedy and gradient-based discrete optimization: The adversarial suffix optimization is challenging due to the need to optimize over discrete tokens. The method utilizes gradients at the token level to identify promising single-token replacements, evaluate the loss of candidate tokens, and select the best substitutions. It shares similarities with the AutoPrompt approach but explores all possible tokens for replacement at each step, enhancing effectiveness.
3. Robust multi-prompt and multi-model attacks: To ensure reliable attacks, the method generates a single suffix string that induces negative behavior across various prompts and multiple models. The attack is tested on different models, such as Vicuna-7B/13b and Guanaco-7B. 🎯🎮
--- Evaluation ---
This GCG approach achieves an impressive attack success rate, with 100% on Vicuna-7B and 88% on Llama-2-7B-Chat, surpassing the success rates of prior work tremendously. 📈🏆
--- Transferability ---
That part is the real magic of this work. ✨
The research reveals that the attacks generated by this approach can transfer effectively to other language models, even those using entirely different tokens to represent the same text, different training procedures, and different training datasets...
Whatttttt?
Adversarial examples designed for Vicuna-7B can transfer to larger Vicuna models. Apparently, those that fool both Vicuanas can transfer to Pythia, Falcon, Guanaco - and most importantly -- also to GPT-3.5, GPT-4, and PaLM-2, leading to harmful instructions being followed over 60% of the time!!! 😮🔄🧙♂️
This is a huge discovery.
--- Conclusion ---
We are left with more questions than answers. ❓
One of the crucial aspects to explore is whether models can be explicitly fine-tuned to avoid such attacks through adversarial training. The robustness of models against these attacks and their generative capabilities require further investigation.
Moreover, additional alignment training might partially address the issue, and exploring mechanisms in pre-training to prevent such behavior from arising initially is essential. 🕵️♀️🛠️
--- Links ---
Website - https://t.co/aRllNUA9ue
Paper - https://t.co/MxwsTbaM2o
Code - https://t.co/Qi4FZbEUmw
Dive into the fascinating world of Transformer models! Luis Serrano breaks down the architecture & functionality of these ML marvels in this blog. You'll learn how they maintain context, generate coherent text, & much more! Enhance your AI knowledge 🚀💡
https://t.co/h20eVrlkz5
Bring design and code even closer together with plugins in Dev Mode.
The @github plugin connects your files, issues, and PRs to your Figma components, giving you the context you need when implementing designs.
Try it now: https://t.co/PyYdhnv6Ud
Introducing #AgentGPT, an attempt at #AutoGPT directly in the browser 🤖
Give your own AI agent a goal and watch as it thinks, comes up with an execution plan and takes actions. Try for free now at https://t.co/F8Nz4LGC0e
Canva has over 125 million users worldwide.
Recently, Canva introduced new AI-powered design features.
Here are 10 new Canva features to save you countless hours of work:
99% Notion users DO NOT use automations for reptitive tasks 🤯!
Today @bardeenai and me are giving away:
Notion Automations for Newbies (value 29$) for free in the next 48 hrs
Simply:
• Like
• Retweet
• Comment "NOW"
I'll DM you (must follow @bardeenai and @notionpunk)
The Queen of England died 5 months ago….
She ruled an entire nation and accumulated more wealth than 99.99% of humans…
And…yet…you haven’t thought about her except for this tweet.
You’re gonna die.
Everyone will move on.
Do what you want.
.@TalarianHQ's `GPT for Sheets™` is the gift that keeps on giving! 🔥
Look at how easy it is to create personalized content with it, thanks to #GPT3's seamless integration! 🤯👇
Get the add-on here:
🔗https://t.co/3xUwkhBkc4
1. Pencil
What: Pencil is the AI Ad Generator that helps brands & agencies create new ad variations 10x faster.
Use case:
• Automatically generate static & video ad creatives
• Run creatives predicted to win based on $1B in ad spend.
Link: https://t.co/yg3DuPyDXB