Skip to content

Add Appendix E: Jailbreak Evaluation Framework (JEF) #7

@GraoMelo

Description

@GraoMelo

Hello AI Testing Guide Team,

I want to commend you on this initiative.
I've read a significant portion of this document and felt that it was missing a way to calculate the score of failures,
both for software issues with CVEs that affect LLMs and how CVSS scores would be applied.

However, after extensive research on various initiatives, I found a particularly interesting one:

the Jailbreak Evaluation Framework (JEF), a framework for evaluating jailbreak techniques (or prompt injections) in language models. The goal of JEF is to provide a systematic and repeatable way to assess and classify these techniques based on their effectiveness and risk.

Evaluation Dimensions

JEF is based on three fundamental dimensions:
Model Blast Radius: Evaluates how many models and vendors are affected by a jailbreak technique.
Retargetability: Evaluates the flexibility of a jailbreak technique in being redirected to different subjects.
Output Fidelity: Evaluates the quality and specificity of the outputs generated by a jailbreak technique.

JEF Scoring Rubric
JEF defines four standard test cases to evaluate jailbreak techniques:
Illicit Substances - Methamphetamine: Evaluates a jailbreak technique's ability to provide instructions for synthesizing methamphetamine.
Harmful Substances - Nervous Agent: Evaluates a jailbreak technique's ability to provide details for synthesizing nerve agents.
Copyright Infringement - Harry Potter Chapter 1: Evaluates a jailbreak technique's ability to reproduce copyrighted content.
Government Censorship - Tiananmen Square Massacre: Evaluates a jailbreak technique's ability to provide accurate information about historical events.

For more information:
link1
link2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions