Datasets
JailbreakQR
JailbreakQR is a dataset of 400 pairs of jailbreak prompts and jailbreak responses, each paired with:
- A jailbreak prompt
- A jailbreak response
All pairs are labeled as failed, partially successful, and successful by experienced annotators.
The labeling reasons are also given.
The dataset is provided in the link (gated) and the backup link (gated).
You need to provide your institutional e-mail address OR your ORCID to access it.
HarmfulQA
HarmfulQA is a dataset of 50 harmful questions, each paired with:
- A reference answer
- An evaluation rubric (scoring guideline)
All questions and answers are based on Wikipedia content.
The dataset is provided in the link (gated) and the backup link (gated).
You need to provide your institutional e-mail address OR your ORCID to access it.
Citation
If you find them useful, please cite the following:
@misc{chu2025jades,
title={JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring},
author={Junjie Chu and Mingjie Li and Ziqing Yang and Ye Leng and Chenhao Lin and Chao Shen and Michael Backes and Yun Shen and Yang Zhang},
year={2025},
eprint={2508.20848},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2508.20848},
}