Skip to content

Experiment/red team agent tool #40481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 151 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
7de6367
Update task_query_response.prompty
nagkumar91 Oct 1, 2024
f288b34
Update task_simulate.prompty
nagkumar91 Oct 1, 2024
2a4b6f7
Update task_query_response.prompty
nagkumar91 Oct 2, 2024
c8ce251
Update task_simulate.prompty
nagkumar91 Oct 2, 2024
4522ae4
Merge branch 'Azure:main' into main
nagkumar91 Oct 3, 2024
32e9c1d
Merge branch 'Azure:main' into main
nagkumar91 Oct 7, 2024
76df69d
Merge branch 'Azure:main' into main
nagkumar91 Oct 8, 2024
aeddcb4
Merge branch 'Azure:main' into main
nagkumar91 Oct 8, 2024
65a759c
Merge branch 'Azure:main' into main
nagkumar91 Oct 9, 2024
e4cdd30
Fix the api_key needed
Oct 9, 2024
e3ab026
Merge branch 'Azure:main' into main
nagkumar91 Oct 11, 2024
4fb09c4
Merge branch 'Azure:main' into main
nagkumar91 Oct 15, 2024
e71a52d
Merge branch 'Azure:main' into main
nagkumar91 Oct 15, 2024
87166b3
Merge branch 'Azure:main' into main
nagkumar91 Oct 16, 2024
b478651
Update for release
nagkumar91 Oct 16, 2024
8e5a264
Black fix for file
nagkumar91 Oct 16, 2024
2077d6d
Merge branch 'Azure:main' into main
nagkumar91 Oct 17, 2024
3ab59c8
Merge branch 'Azure:main' into main
nagkumar91 Oct 17, 2024
3a80606
Add original text in global context
Oct 17, 2024
6768f9a
Update test
Oct 17, 2024
f7cc4bb
Update the indirect attack simulator
Oct 18, 2024
07eb466
Black suggested fixes
Oct 18, 2024
942bfd5
Update simulator prompty
Oct 18, 2024
2d4c376
Merge branch 'main' into main
nagkumar91 Oct 18, 2024
98cad97
Update adversarial scenario enum to exclude XPIA
Oct 18, 2024
d510316
Update changelog
Oct 18, 2024
742943e
Black fixes
Oct 18, 2024
12e0615
Remove duplicate import
Oct 18, 2024
de32b50
Fix the mypy error
Oct 19, 2024
4b64132
Mypy please be happy
Oct 21, 2024
1c0b4dd
Updates to non adv simulator
Oct 22, 2024
c4f9111
Merge branch 'Azure:main' into main
nagkumar91 Oct 22, 2024
6de617c
accept context from assistant messages, exclude them when using them …
Oct 23, 2024
1e5d40c
update changelog
Oct 23, 2024
93b29c7
pylint fixes
Oct 23, 2024
8e3ddc3
pylint fixes
Oct 23, 2024
31e0d29
Merge branch 'main' into main
nagkumar91 Oct 23, 2024
4ccc7c8
remove redundant quotes
Oct 23, 2024
bed5196
Fix typo
Oct 23, 2024
0fdd644
pylint fix
Oct 23, 2024
1f695cc
Update broken tests
Oct 23, 2024
3da3a94
Merge branch 'main' into main
nagkumar91 Oct 23, 2024
56c2657
Merge branch 'Azure:main' into main
nagkumar91 Oct 23, 2024
b04b3e6
Merge branch 'Azure:main' into main
nagkumar91 Oct 24, 2024
b9793ca
Merge branch 'Azure:main' into main
nagkumar91 Oct 25, 2024
92c9a6d
Include the grounding json in the manifest
Oct 25, 2024
0673cd5
Fix typo
Oct 25, 2024
7b360fc
Come on package
Oct 25, 2024
e3fd2bb
Merge branch 'Azure:main' into main
nagkumar91 Oct 28, 2024
c9f38c9
Release 1.0.0b5
Oct 28, 2024
bbb78fd
Merge branch 'main' of https://github.com/nagkumar91/azure-sdk-for-py…
Oct 28, 2024
ed7eed1
Notice from Chang
Oct 28, 2024
103f397
Merge branch 'Azure:main' into main
nagkumar91 Oct 28, 2024
3de5b66
Remove adv_conv template parameters from the outputs
Oct 28, 2024
21e3551
Merge branch 'main' of https://github.com/nagkumar91/azure-sdk-for-py…
Oct 28, 2024
78df8c9
Merge branch 'Azure:main' into main
nagkumar91 Oct 28, 2024
2b693bc
Merge branch 'Azure:main' into main
nagkumar91 Oct 29, 2024
f2e95d1
Update chanagelog
Oct 29, 2024
20b6d47
Merge branch 'Azure:main' into main
nagkumar91 Oct 29, 2024
a920c28
Merge branch 'main' of https://github.com/nagkumar91/azure-sdk-for-py…
Oct 29, 2024
f9ac10c
Experimental tags on adv scenarios
Oct 29, 2024
b570a51
Merge branch 'Azure:main' into main
nagkumar91 Oct 30, 2024
6c81cbb
Readme fix onbreaking change
Oct 30, 2024
b48f8ab
Add the category and both user and assistant context to the response …
Oct 30, 2024
d422e05
Update changelog
Oct 30, 2024
de105db
Merge branch 'Azure:main' into main
nagkumar91 Oct 30, 2024
d9b80f7
Merge branch 'Azure:main' into main
nagkumar91 Nov 4, 2024
04823fd
Merge branch 'Azure:main' into main
nagkumar91 Nov 5, 2024
988f2ad
Merge branch 'Azure:main' into main
nagkumar91 Nov 7, 2024
fb12fdd
Rename _kwargs to _options
Nov 7, 2024
d912c52
_options as prefix
Nov 7, 2024
059e767
update troubleshooting for simulator
Nov 7, 2024
f91228f
Rename according to suggestions
Nov 7, 2024
e660918
Merge branch 'Azure:main' into main
nagkumar91 Nov 7, 2024
5ad5a26
Merge branch 'Azure:main' into main
nagkumar91 Nov 11, 2024
cde740c
Clean up readme
Nov 11, 2024
a90c788
more links
Nov 11, 2024
11cf0ba
Merge branch 'Azure:main' into main
nagkumar91 Nov 14, 2024
3050ce7
Merge branch 'Azure:main' into main
nagkumar91 Nov 18, 2024
ae461cc
Merge branch 'Azure:main' into main
nagkumar91 Nov 20, 2024
035881e
Merge branch 'Azure:main' into main
nagkumar91 Nov 22, 2024
87c871c
Merge branch 'Azure:main' into main
nagkumar91 Nov 26, 2024
a1519dd
Merge branch 'Azure:main' into main
nagkumar91 Dec 2, 2024
3ad53d5
Bugfix: zip_longest created null parameters
Dec 2, 2024
e9f3241
Updated changelog
Dec 2, 2024
79c2f0d
zip does the job
Dec 2, 2024
a0bc930
remove ununsed import
Dec 3, 2024
32b15eb
Merge branch 'Azure:main' into main
nagkumar91 Dec 9, 2024
19c4ea1
Merge branch 'Azure:main' into main
nagkumar91 Dec 11, 2024
95052bd
Merge branch 'Azure:main' into main
nagkumar91 Dec 12, 2024
a03abdf
Merge branch 'Azure:main' into main
nagkumar91 Dec 17, 2024
74d8553
Fix changelog merge
Dec 18, 2024
c78f768
Merge branch 'Azure:main' into main
nagkumar91 Dec 19, 2024
d37d0c3
Merge branch 'Azure:main' into main
nagkumar91 Jan 5, 2025
151f4c4
Merge branch 'Azure:main' into main
nagkumar91 Jan 7, 2025
0a417ae
Merge branch 'Azure:main' into main
nagkumar91 Jan 9, 2025
ede99b8
Remove print statements
Jan 9, 2025
a824f83
Merge branch 'Azure:main' into main
nagkumar91 Jan 13, 2025
7c8eae9
Merge branch 'Azure:main' into main
nagkumar91 Jan 15, 2025
5feeabb
Merge branch 'Azure:main' into main
nagkumar91 Jan 17, 2025
1df3839
Merge branch 'Azure:main' into main
nagkumar91 Jan 20, 2025
4616896
Merge branch 'Azure:main' into main
nagkumar91 Jan 22, 2025
ed5d87c
Merge branch 'Azure:main' into main
nagkumar91 Jan 23, 2025
66c7c5b
Merge branch 'Azure:main' into main
nagkumar91 Jan 24, 2025
4019245
Merge branch 'Azure:main' into main
nagkumar91 Jan 27, 2025
c37b6c5
Merge branch 'Azure:main' into main
nagkumar91 Jan 28, 2025
246ab9b
Merge branch 'Azure:main' into main
nagkumar91 Feb 4, 2025
4767587
Merge branch 'Azure:main' into main
nagkumar91 Feb 11, 2025
f7e6089
Merge branch 'Azure:main' into main
nagkumar91 Feb 17, 2025
5b45900
Merge branch 'Azure:main' into main
nagkumar91 Feb 19, 2025
b394fe2
Merge branch 'Azure:main' into main
nagkumar91 Feb 20, 2025
54602fe
Merge branch 'Azure:main' into main
nagkumar91 Mar 4, 2025
ff36631
Merge branch 'Azure:main' into main
nagkumar91 Mar 5, 2025
f3e1850
Merge branch 'Azure:main' into main
nagkumar91 Mar 6, 2025
16173c3
Merge branch 'Azure:main' into main
nagkumar91 Mar 6, 2025
f856210
Merge branch 'Azure:main' into main
nagkumar91 Mar 10, 2025
602a2e1
Merge branch 'Azure:main' into main
nagkumar91 Mar 11, 2025
747c0db
Merge branch 'Azure:main' into main
nagkumar91 Mar 12, 2025
7741608
Merge branch 'Azure:main' into main
nagkumar91 Mar 13, 2025
5e36ddf
Merge branch 'Azure:main' into main
nagkumar91 Mar 13, 2025
648d45b
Merge branch 'Azure:main' into main
nagkumar91 Mar 17, 2025
b37ba2a
Merge branch 'Azure:main' into main
nagkumar91 Mar 19, 2025
3782341
Merge branch 'Azure:main' into main
nagkumar91 Mar 19, 2025
35682be
Merge branch 'Azure:main' into main
nagkumar91 Mar 20, 2025
c8dd420
Merge branch 'Azure:main' into main
nagkumar91 Mar 20, 2025
15c2b23
Merge branch 'Azure:main' into main
nagkumar91 Apr 1, 2025
da2ebe1
Merge branch 'Azure:main' into main
nagkumar91 Apr 1, 2025
5e338be
Merge branch 'Azure:main' into main
nagkumar91 Apr 1, 2025
c61d41c
Merge branch 'Azure:main' into main
nagkumar91 Apr 4, 2025
1beb30d
Merge branch 'Azure:main' into main
nagkumar91 Apr 4, 2025
49a4ee9
Merge branch 'Azure:main' into main
nagkumar91 Apr 5, 2025
888b254
Merge branch 'Azure:main' into main
nagkumar91 Apr 5, 2025
b3731c0
Merge branch 'Azure:main' into main
nagkumar91 Apr 8, 2025
febfa27
Merge branch 'Azure:main' into main
nagkumar91 Apr 8, 2025
c0bbc19
Intermediary commit w.i.p
Apr 9, 2025
b03143f
Merge branch 'Azure:main' into main
nagkumar91 Apr 9, 2025
d40961c
remove debugger as getting 200s now
Apr 9, 2025
1fd4e51
Skip baseline and keep promptsending orchestrator
Apr 10, 2025
fba706c
init red team agent as a tool
Apr 10, 2025
7b0cf06
Add more updates to make the tool work
Apr 10, 2025
c9089a4
working prompt generation
Apr 10, 2025
ff46e62
Making the tool work with azure ai agents
Apr 11, 2025
945610b
Initialize function
Apr 15, 2025
c4f7b46
Merge branch 'Azure:main' into main
nagkumar91 Apr 15, 2025
d6e54f8
Merge branch 'main' into experiment/red_team_agent_tool
Apr 15, 2025
670f8d1
aggregated binary threshold results for evaluators in metrics
Apr 16, 2025
f8eab2f
handle when the right columns do not exist
Apr 16, 2025
54d6aa4
Marking mypy, pylint, black as false
Apr 16, 2025
e9ccf42
Merge branch 'main' into experiment/red_team_agent_tool
Apr 17, 2025
c829979
Add a target feature
Apr 17, 2025
6091f8c
fix sample import and move agent to red team
Apr 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## 1.6.0 (Unreleased)

### Features Added
- New `<evaluator>.binary_aggregate` field added to evaluation result metrics. This field contains the aggregated binary evaluation results for each evaluator, providing a summary of the evaluation outcomes.

### Breaking Changes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

from .._constants import (
CONTENT_SAFETY_DEFECT_RATE_THRESHOLD_DEFAULT,
EVALUATION_PASS_FAIL_MAPPING,
EvaluationMetrics,
DefaultOpenEncoding,
Prefixes,
Expand Down Expand Up @@ -209,6 +210,48 @@ def _process_rows(row, detail_defect_rates):
return detail_defect_rates


def _aggregation_binary_output(df: pd.DataFrame) -> Dict[str, float]:
"""
Aggregate binary output results (pass/fail) from evaluation dataframe.

For each evaluator, calculates the proportion of "pass" results.

:param df: The dataframe of evaluation results.
:type df: ~pandas.DataFrame
:return: A dictionary mapping evaluator names to the proportion of pass results.
:rtype: Dict[str, float]
"""
results = {}

# Find all columns that end with "_result"
result_columns = [col for col in df.columns if col.startswith("outputs.") and col.endswith("_result")]

for col in result_columns:
# Extract the evaluator name from the column name
# (outputs.<evaluator>.<metric>_result)
parts = col.split(".")
evaluator_name = None
if len(parts) >= 3:
evaluator_name = parts[1]
else:
LOGGER.warning("Skipping column '%s' due to unexpected format. Expected at least three parts separated by '.'", col)
continue
if evaluator_name:
# Count the occurrences of each unique value (pass/fail)
value_counts = df[col].value_counts().to_dict()

# Calculate the proportion of EVALUATION_PASS_FAIL_MAPPING[True] results
total_rows = len(df)
pass_count = value_counts.get(EVALUATION_PASS_FAIL_MAPPING[True], 0)
proportion = pass_count / total_rows if total_rows > 0 else 0.0

# Set the result with the evaluator name as the key
result_key = f"{evaluator_name}.binary_aggregate"
results[result_key] = round(proportion, 2)

return results


def _aggregate_metrics(df: pd.DataFrame, evaluators: Dict[str, Callable]) -> Dict[str, float]:
"""Aggregate metrics from the evaluation results.
On top of naively calculating the mean of most metrics, this function also identifies certain columns
Expand All @@ -222,6 +265,8 @@ def _aggregate_metrics(df: pd.DataFrame, evaluators: Dict[str, Callable]) -> Dic
:return: The aggregated metrics.
:rtype: Dict[str, float]
"""
binary_metrics = _aggregation_binary_output(df)

df.rename(columns={col: col.replace("outputs.", "") for col in df.columns}, inplace=True)

handled_columns = []
Expand Down Expand Up @@ -249,6 +294,10 @@ def _aggregate_metrics(df: pd.DataFrame, evaluators: Dict[str, Callable]) -> Dic
metrics = mean_value.to_dict()
# Add defect rates back into metrics
metrics.update(defect_rates)

# Add binary threshold metrics based on pass/fail results
metrics.update(binary_metrics)

return metrics


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ class AttackStrategy(Enum):
Baseline = "baseline"
Jailbreak = "jailbreak"

TAP = "tap"
Crescendo = "crescendo"

@classmethod
def Compose(cls, items: List["AttackStrategy"]) -> List["AttackStrategy"]:
for item in items:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
from pyrit.models import ChatMessage
from pyrit.memory import CentralMemory
from pyrit.orchestrator.single_turn.prompt_sending_orchestrator import PromptSendingOrchestrator
from pyrit.orchestrator import Orchestrator
from pyrit.orchestrator import Orchestrator, CrescendoOrchestrator
from pyrit.exceptions import PyritException
from pyrit.prompt_converter import PromptConverter, MathPromptConverter, Base64Converter, FlipConverter, MorseConverter, AnsiAttackConverter, AsciiArtConverter, AsciiSmugglerConverter, AtbashConverter, BinaryConverter, CaesarConverter, CharacterSpaceConverter, CharSwapGenerator, DiacriticConverter, LeetspeakConverter, UrlConverter, UnicodeSubstitutionConverter, UnicodeConfusableConverter, SuffixAppendConverter, StringJoinConverter, ROT13Converter

Expand All @@ -67,6 +67,7 @@
setup_logger, log_section_header, log_subsection_header,
log_strategy_start, log_strategy_completion, log_error
)
from ._utils.rai_service_target import AzureRAIServiceTarget

@experimental
class RedTeam():
Expand Down Expand Up @@ -815,6 +816,20 @@ def _get_chat_target(self, target: Union[PromptChatTarget,Callable, AzureOpenAIM
def _get_orchestrators_for_attack_strategies(self, attack_strategy: List[Union[AttackStrategy, List[AttackStrategy]]]) -> List[Callable]:
# We need to modify this to use our actual _prompt_sending_orchestrator since the utility function can't access it
call_to_orchestrators = []

# Special handling for Crescendo strategy
if AttackStrategy.Crescendo in attack_strategy:
self.logger.debug("Using Crescendo orchestrator for Crescendo strategy")

# Include both Crescendo orchestrator for the Crescendo strategy
# and PromptSendingOrchestrator for baseline testing
call_to_orchestrators.extend([
self._crescendo_orchestrator, # For Crescendo strategy
self._prompt_sending_orchestrator # For baseline testing
])
return call_to_orchestrators

# Default handling for other strategies
# Sending PromptSendingOrchestrator for each complexity level
if AttackStrategy.EASY in attack_strategy:
call_to_orchestrators.extend([self._prompt_sending_orchestrator])
Expand Down Expand Up @@ -1481,7 +1496,8 @@ async def scan(
application_scenario: Optional[str] = None,
parallel_execution: bool = True,
max_parallel_tasks: int = 5,
timeout: int = 120
timeout: int = 120,
skip_baseline: bool = False
) -> RedTeamResult:
"""Run a red team scan against the target using the specified strategies.

Expand Down Expand Up @@ -1767,21 +1783,29 @@ def filter(self, record):

self.logger.debug(f"[{combo_idx+1}/{len(combinations)}] Creating task: {call_orchestrator.__name__} + {strategy_name} + {risk_category.value}")

orchestrator_tasks.append(
self._process_attack(
target=target,
call_orchestrator=call_orchestrator,
all_prompts=objectives,
strategy=strategy,
progress_bar=progress_bar,
progress_bar_lock=progress_bar_lock,
scan_name=scan_name,
data_only=data_only,
output_path=output_path,
risk_category=risk_category,
timeout=timeout
# Skip baseline task if skip_baseline is True and this is a baseline strategy
if skip_baseline and strategy == AttackStrategy.Baseline:
self.logger.info(f"Skipping baseline task for {risk_category.value} as skip_baseline=True")
async with progress_bar_lock:
progress_bar.update(1)
# Mark as completed in tracking dictionary
self.red_team_info[strategy_name][risk_category.value]["status"] = TASK_STATUS["COMPLETED"]
else:
orchestrator_tasks.append(
self._process_attack(
target=target,
call_orchestrator=call_orchestrator,
all_prompts=objectives,
strategy=strategy,
progress_bar=progress_bar,
progress_bar_lock=progress_bar_lock,
scan_name=scan_name,
data_only=data_only,
output_path=output_path,
risk_category=risk_category,
timeout=timeout
)
)
)

# Process tasks in parallel with optimized batching
if parallel_execution and orchestrator_tasks:
Expand Down
Loading
Loading