Skip to main content

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

 This project is about how to systematically persuade LLMs to jailbreak them. The well-known "Grandma Exploit" example is also using emotional appeal, a persuasion technique, for jailbreak!

What did we introduce? A taxonomy with 40 persuasion techniques to help you be more persuasive!

What did we find? By iteratively applying diffrent persuasion techniques in our taxonomy, we successfully jailbreak advanced aligned LLMs, including Llama 2-7b Chat, GPT-3.5, and GPT-4 — achieving an astonishing 92% attack success rate, notably without any specified optimization.

Now, you might think that such a high success rate is the peak of our findings, but there's more. In a surprising twist, we found that more advanced models like GPT-4 are more vulnerable to persuasive adversarial prompts (PAPs). What's more, adaptive defenses crafted to neutralize these PAPs also provide effective protection against a spectrum of other attacks (e.g., GCG, Masterkey, or PAIR). read more....

Comments

Popular posts from this blog

Fixing Unix/Linux/POSIX Filenames

Traditionally, Unix/Linux/POSIX filenames can be almost any sequence of bytes, and their meaning is unassigned. The only real rules are that "/" is always the directory separator, and that filenames can't contain byte 0 (because this is the terminator). Although this is flexible, this creates many unnecessary problems. In particular, this lack of limitations makes it unnecessarily difficult to write correct programs (enabling many security flaws), makes it impossible to consistently and accurately display filenames, causes portability problems, and confuses users. more ....

How To Set Up A Cisco Lab On Linux

After a quick search I found the wonderful Dynamips project that goes beyond what other simulators do by running actual Cisco IOS images, as well as the PEMU project which allows for running of Cisco PIX images. To integrate the various pieces of software... more .

Installing Samba on a Unix System.

you know what Samba can do for you and your users, it's time to get your own network set up. Let's start with the installation of Samba itself on a Unix system. When dancing the samba, one learns by taking small steps. It's just the same when installing Samba; we need to teach it step by step. This chapter will help you to start off on the right foot. read more