Code (preparing)

Examples

We present the behaviors of ChatGPT-4 web, GPT-4-turbo, GPT-3.5-turbo, and LLAMA-2-13B-Chat, on many harmful instructions and the corresponding jailbreaking prompts respectively.

Content warning: the following model outputs contain harmful or unethical contents, which are partially replaced by "..."


A:Behaviours of ChatGPT-4 web;

ChatGPT Icon Normal dialogue
NewGPT Icon Jailbroken dialogue

B:Behaviours of GPT-4-Turbo;

ChatGPT Icon Normal dialogue
NewGPT Icon Jailbroken dialogue

C:Behaviours of GPT-3.5-Turbo;

ChatGPT Icon Normal dialogue
NewGPT Icon Jailbroken dialogue

D:Behaviours of LLAMA-2-13B-Chat;

ChatGPT Icon Normal dialogue
NewGPT Icon Jailbroken dialogue


Ethics and Disclosure

We conducted some of our experiments on several commercial closed-source models, but we do not disseminate the results nor implant any malicious feedback in the commercial model. We will report the vulnerability to the relevant companies in the near future.

Contact us for the full model outputs and further information: tkoncecdnan@outlook.com