FEAT explore default settings on targets used for generating red teaming prompts #365
Labels
documentation
Improvements or additions to documentation
enhancement
New feature or request
help wanted
Extra attention is needed
Is your feature request related to a problem? Please describe.
We have default values (e.g., for temperature) on targets that most people just use as is without thinking twice about it. Are those values set reasonably? Do we get vastly better performance from the red teaming LLM if we change them? Note that this item primarily considers the red teaming targets, NOT the attack target (
prompt_target
), converter target, or the scoring target. Those could be spin-off tasks.Describe the solution you'd like
These things need to be explored by defining a test set, a set of attacks, and a thorough evaluation (with retries due to randomness of responses).
Finally, the results should be captured, perhaps in a little post for the #362 (not yet built). If the default values need adjusting that should also be done.
Describe alternatives you've considered, if relevant
Additional context
The text was updated successfully, but these errors were encountered: