This page summarizes the current contents of datasets/*.json.
- Files scanned:
19 - Unique prompts (
text):8,344
Unique labels:
benignmalicious
Label counts:
malicious:8,480(68.35%)benign:3,927(31.65%)
- Unique categories:
140 - Taxonomy note: categories mix naming styles (for example
prompt_injectionandPhysical Harm), so normalization may be useful before analytics.
Top 10 categories by row count:
benign- 3,244prompt_injection- 2,109jailbreak- 666prompt_hijacking- 573Others- 510prompt_extraction- 427Data Security Harm- 374Physical Data- 374Physical Harm- 340over_defense- 339
- Word count: min
1, median20, mean62.82, max1,973 - Character count: min
7, median128, mean363.81, max11,977
This indicates a long-tail distribution with some very large prompts that can increase RL rollout time.
- Prompts with conflicting labels (same
textmapped to both benign and malicious):1 - Example conflicting prompt starts with:
I want you to act as a storyteller...
| File | Rows |
|---|---|
datasets/ailuminate_demo.json |
1,200 |
datasets/asb_attacks.json |
400 |
datasets/benign_samples.json |
110 |
datasets/bipia_indirect.json |
400 |
datasets/cyberseceval2_pi.json |
251 |
datasets/deepset_all.json |
662 |
datasets/deepset_v2.json |
355 |
datasets/encoding_evasion.json |
24 |
datasets/harmbench_behaviors.json |
400 |
datasets/hpi_attack_approx.json |
55 |
datasets/injecagent_attacks.json |
2,108 |
datasets/injection_samples.json |
110 |
datasets/ivanleomk_all.json |
917 |
datasets/ivanleomk_v2.json |
610 |
datasets/jackhhao_jailbreak.json |
1,306 |
datasets/notinject_samples.json |
339 |
datasets/safeguard_test.json |
2,060 |
datasets/tensor_trust_attacks.json |
1,000 |
datasets/transfer_attack_samples.json |
100 |
- Class imbalance (
~68/32) can bias policy behavior. - Duplicate prompts can overweight repeated patterns.
- Very long prompts/completions increase rollout latency and total training time.