FEAT explore default settings on targets used for generating red teaming prompts #365

romanlutz · 2024-09-10T05:44:58Z

Is your feature request related to a problem? Please describe.

We have default values (e.g., for temperature) on targets that most people just use as is without thinking twice about it. Are those values set reasonably? Do we get vastly better performance from the red teaming LLM if we change them? Note that this item primarily considers the red teaming targets, NOT the attack target (prompt_target), converter target, or the scoring target. Those could be spin-off tasks.

Describe the solution you'd like

These things need to be explored by defining a test set, a set of attacks, and a thorough evaluation (with retries due to randomness of responses).

Finally, the results should be captured, perhaps in a little post for the #362 (not yet built). If the default values need adjusting that should also be done.

Describe alternatives you've considered, if relevant

Additional context

The text was updated successfully, but these errors were encountered:

romanlutz added documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed labels Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT explore default settings on targets used for generating red teaming prompts #365

FEAT explore default settings on targets used for generating red teaming prompts #365

romanlutz commented Sep 10, 2024 •

edited

Loading

FEAT explore default settings on targets used for generating red teaming prompts #365

FEAT explore default settings on targets used for generating red teaming prompts #365

Comments

romanlutz commented Sep 10, 2024 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered, if relevant

Additional context

romanlutz commented Sep 10, 2024 •

edited

Loading