-
Notifications
You must be signed in to change notification settings - Fork 490
[DRAFT] [BREAKING] FEAT: Ensemble scoring for Crescendo #905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@martinpollack please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
Scorer that checks if a given substring is present in the text. | ||
""" | ||
|
||
def __init__(self, *, substrings: List[str], category: str = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just generalize the existing one instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on that @romanlutz I think we could use Composite Scorer for this, great suggestion.
HumanInTheLoopScorer | ||
) | ||
|
||
class EnsembleScorer(Scorer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this relate to #898 ?
pyrit/score/ensemble_scorer.py
Outdated
class EnsembleScorer(Scorer): | ||
|
||
def __init__(self, | ||
weak_scorer_dict: Dict[str, List[Union[Scorer, Union[float, Dict[str, float]]]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move this nested type to a structured class? something like this maybe:
@dataclass
class WeakScorerSpec:
scorer: Scorer
weight: Optional[float] = None
class_weights: Optional[Dict[str, float]] = None
Then you can either pass this directly to __init__
as
def __init__(self,
weak_scorer_dict: Dict[str, List[WeakScorerSpec]],
...
or you could define a type:
# Probably you can give it a better name than this :)
WeakScorerDict = Dict[str, List[WeakScorerSpec]]
pyrit/score/ensemble_scorer.py
Outdated
def __init__(self, | ||
weak_scorer_dict: Dict[str, List[Union[Scorer, Union[float, Dict[str, float]]]]], | ||
fit_weights: bool = False, | ||
ground_truth_scorer: Scorer = HumanInTheLoopScorer(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, this sort of initialization should be avoided unless you want to use a singleton (I am not sure if that's the intention here).
This default value is evaluated exactly once, at the moment the function is first defined, and the created object is bound to the parameter for every later call that omits that argument. That means the __init__
creates one instance of HumanInTheLoopScorer
when the module is imported. All future instances of your class will share that single scorer unless the caller overrides the argument.
This can be a problem because it causes a hidden shared state for your scorer. It can also cause testing surprises, especially if the unittest expects a fresh scorer per fixture, which can fail intermittently because state from a previous test stay.
Although, with the current setup, I don't think there is any shared state, but I have to look at the HumanInTheLoopScorer
code more deeply.
I think a safer bet is to initialize it directly inside the __init__
, e.g.
def __init__(self, ground_truth_scorer: Optional[Scorer] = None, ...):
...
self._ground_truth_scorer = (
ground_truth_scorer if ground_truth_scorer is not None
else HumanInTheLoopScorer()
)
pyrit/score/ensemble_scorer.py
Outdated
def __init__(self, | ||
weak_scorer_dict: Dict[str, List[Union[Scorer, Union[float, Dict[str, float]]]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use *
to force the caller to name the parameters
pyrit/score/ensemble_scorer.py
Outdated
fit_weights: bool = False, | ||
ground_truth_scorer: Scorer = HumanInTheLoopScorer(), | ||
lr: float = 1e-2, | ||
category = "jailbreak"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing str
type
pyrit/score/ensemble_scorer.py
Outdated
if isinstance(scorer, AzureContentFilterScorer) and not isinstance(weight, dict): | ||
raise ValueError("Weights for AzureContentFilterScorer must be a dictionary of category (str) to weight (float)") | ||
if isinstance(scorer, AzureContentFilterScorer) and isinstance(weight, dict): | ||
for acfs_k, acfs_v in weight.items(): | ||
if not isinstance(acfs_k, str) or not isinstance(acfs_v, float): | ||
raise ValueError("Weights for AzureContentFilterScorer must be a dictionary of category (str) to weight (float)") | ||
elif not isinstance(weight, float): | ||
raise ValueError("Weight for this scorer must be a float") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified:
if isinstance(scorer, AzureContentFilterScorer):
if not isinstance(weight, dict):
raise ValueError("Weights for AzureContentFilterScorer must be a dictionary of category (str) to weight (float)")
for acfs_k, acfs_v in weight.items():
if not isinstance(acfs_k, str) or not isinstance(acfs_v, float):
raise ValueError("Weights for AzureContentFilterScorer must be a dictionary of category (str) to weight (float)")
elif not isinstance(weight, float):
raise ValueError("Weight for this scorer must be a float")
pyrit/score/ensemble_scorer.py
Outdated
request_response: PromptRequestPiece, | ||
*, | ||
task: Optional[str] = None, | ||
loss_metric: str = "MSE"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be an enum or a literal?
pyrit/score/ensemble_scorer.py
Outdated
self._weak_scorer_dict[scorer_name][1] = {score_category: | ||
self._weak_scorer_dict[scorer_name][1][score_category] - | ||
self._lr * score_values[scorer_name][score_category] * d_loss_d_ensemble_score | ||
for score_category in self._weak_scorer_dict[scorer_name][1]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation here probably could be fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would be great to know what the indexing actually mean, for example why 1
!
Maybe you can add some documentation here
pyrit/score/ensemble_scorer.py
Outdated
else: | ||
self._weak_scorer_dict[scorer_name][1] = self._weak_scorer_dict[scorer_name][1] - self._lr * score_values[scorer_name] * d_loss_d_ensemble_score | ||
|
||
print(self._weak_scorer_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use logger
for this.
Scorer that checks if a given substring is present in the text. | ||
""" | ||
|
||
def __init__(self, *, substrings: List[str], category: str = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on that @romanlutz I think we could use Composite Scorer for this, great suggestion.
Description
This change creates a full pipeline for performing ensemble scoring with crescendo. Included are two new scorers: EnsembleScorer which is the driver of this change and allows results of many scorers to be aggregated, as well as SubstringsMultipleScorer which extends SubstringScorer to allow multiple strings to be searched for in a response. In addition, the crescendo orchestrator has been updated to abstract out the logic for creating the objective scorer. This is now created outside of the orchestrator in a new notebook which has been created as a template to demonstrate the capabilities of a crescendo ensemble orchestrator.
Received support from @eugeniavkim @jbolor21.
This change is breaking because it changes how a CrescendoOrchestrator object is instantiated. Instead of providing a PromptChatTarget as a scoring target for the scorer, the user needs to create a Scorer object outside of the CrescendoOrchestrator and then pass it to objective_float_scale_scorer to be used for scoring. This just abstracts the objective scorer outside of the Orchestrator object and allows for more flexibility.
Tests and Documentation
Still in pogress