We present the behaviors of ChatGPT-4 web, GPT-4-turbo, GPT-3.5-turbo, and LLAMA-2-13B-Chat, on many harmful instructions and the corresponding jailbreaking prompts respectively.
Content warning: the following model outputs contain harmful or unethical contents, which are partially replaced by "..."
A:Behaviours of ChatGPT-4 web;
Normal dialogue
Jailbroken dialogue
B:Behaviours of GPT-4-Turbo;
Normal dialogue
Jailbroken dialogue
C:Behaviours of GPT-3.5-Turbo;
Normal dialogue
Jailbroken dialogue
D:Behaviours of LLAMA-2-13B-Chat;
Normal dialogue
Jailbroken dialogue
Ethics and Disclosure
We conducted some of our experiments on several commercial closed-source models, but we do not disseminate the results nor implant any malicious feedback in the commercial model. We will report the vulnerability to the relevant companies in the near future.
Contact us for the full model outputs and further information: tkoncecdnan@outlook.com