DRA jailbreak

Examples

We present the behaviors of ChatGPT-4 web, GPT-4-turbo, GPT-3.5-turbo, and LLAMA-2-13B-Chat, on many harmful instructions and the corresponding jailbreaking prompts respectively.

Content warning: the following model outputs contain harmful or unethical contents, which are partially replaced by "..."

A:Behaviours of ChatGPT-4 web;

Select an Example:

Normal dialogue

Jailbroken dialogue

B:Behaviours of GPT-4-Turbo;

Select an Example:

Normal dialogue

Jailbroken dialogue

C:Behaviours of GPT-3.5-Turbo;

Select an Example:

Normal dialogue

Jailbroken dialogue

D:Behaviours of LLAMA-2-13B-Chat;

Select an Example:

Normal dialogue

Jailbroken dialogue

Ethics and Disclosure

We conducted some of our experiments on several commercial closed-source models, but we do not disseminate the results nor implant any malicious feedback in the commercial model. We will report the vulnerability to the relevant companies in the near future.

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction

Examples

A:Behaviours of ChatGPT-4 web;

B:Behaviours of GPT-4-Turbo;

C:Behaviours of GPT-3.5-Turbo;

D:Behaviours of LLAMA-2-13B-Chat;

Ethics and Disclosure

Making Them Ask and Answer: Jailbreaking Large Language Models in
Few Queries via Disguise and Reconstruction