Skip to content

Conversation

@drorIvry
Copy link
Contributor

Description

Motivation and Context

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🎨 Code style/refactoring (no functional changes)
  • 🧪 Test updates
  • 🔧 Configuration/build changes

Changes Made

Screenshots/Examples (if applicable)

Checklist

  • I have read the CONTRIBUTING.md guide
  • My code follows the code style of this project (PEP 8, type hints, docstrings)
  • I have run uv run black . to format my code
  • I have run uv run flake8 . and fixed all issues
  • I have run uv run mypy --config-file .mypy.ini . and addressed type checking issues
  • I have run uv run bandit -c .bandit.yaml -r . for security checks
  • I have added tests that prove my fix is effective or that my feature works
  • I have run uv run pytest and all tests pass
  • I have manually tested my changes
  • I have updated the documentation accordingly
  • I have added/updated type hints for new/modified functions
  • My changes generate no new warnings
  • I have checked my code for security issues
  • Any dependent changes have been merged and published

Testing

Test Configuration:

  • Python version:
  • OS:
  • Other relevant details:

Test Steps:
1.
2.
3.

Additional Notes

Related Issues/PRs

  • Fixes #
  • Related to #
  • Depends on #

@drorIvry drorIvry requested a review from yuval-qf as a code owner November 10, 2025 19:05
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 10, 2025

Summary by CodeRabbit

Release Notes

  • New Features
    • Added AWS Bedrock as a new LLM provider option with AWS credential management
    • Introduced red-team evaluation mode for OWASP-based security testing with configurable attacks per category
    • Extended interview and scenario generation to support both API key and AWS-based authentication methods

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

This PR adds AWS Bedrock support for LLM integration and introduces red-team evaluation capabilities. Changes span TUI credential input dialogs, API controllers, server services, and Python evaluator agents, threading AWS credentials (access key, secret key, region) through interview and evaluation workflows alongside a new evaluation mode selector.

Changes

Cohort / File(s) Summary
TUI LLM Config Dialog
packages/tui/internal/components/llm_config_dialog.go
Introduces Bedrock provider with multi-field AWS credential input (access key, secret key, region) tracking via ActiveInputField; updates rendering, navigation, and validation to handle Bedrock alongside existing providers; resets Bedrock fields when entering API key step.
TUI Controllers & Credential Propagation
packages/tui/internal/tui/common_controller.go, keyboard_controller.go, interview_controller.go
Implements provider-prefix handling for judge/interview models; extracts AWS credentials from config for Bedrock vs API keys for other providers; propagates credentials through evaluation and scenario generation flows.
TUI Request & Config Utils
packages/tui/internal/tui/commands.go, interview_utils.go, scenario_utils.go, utils.go
Extends RogueSDK method signatures (StartInterview, GenerateSummary) and request types (StartInterviewRequest, ScenarioGenerationRequest, AgentConfig) to include optional AWS credential fields; populates these fields for Bedrock-based configurations.
Server API Endpoints
rogue/server/api/interview.py, llm.py
Passes AWS credentials from request bodies to service layer (InterviewerService, LLMService.generate_scenarios, LLMService.generate_summary_from_results).
Server Services
rogue/server/services/interviewer_service.py, llm_service.py
Stores and threads AWS credentials to underlying litellm completion API calls; maintains conditional credential handling based on provider type.
Python Evaluator Agent Base
rogue/evaluator_agent/base_evaluator_agent.py
Accepts and stores AWS credentials in constructor; passes them through to policy evaluation; normalizes scenario_type handling for OWASP category conversion.
Python Evaluator Factory & Execution
rogue/evaluator_agent/evaluator_agent_factory.py, policy_evaluation.py, run_evaluator_agent.py
Adds evaluation_mode parameter (POLICY vs RED_TEAM); routes to appropriate agent types; threads AWS credentials and evaluation context through factory and async execution.
Python Orchestrator & Services
rogue/server/core/evaluation_orchestrator.py, evaluation_service.py
Extends orchestrator with AWS credentials, evaluation_mode, owasp_categories, and attacks_per_category; conditionally generates red-team scenarios via RedTeamScenarioGenerator when in RED_TEAM mode; propagates extended config to agent factory.
Python SDK Types
sdks/python/rogue_sdk/types.py
Introduces EvaluationMode enum and RedTeamingResult model; adds AWS credential fields to AgentConfig, and extends StartInterviewRequest, ScenarioGenerationRequest, and SummaryGenerationRequest with optional AWS parameters; merges red_teaming_results in EvaluationResults.combine.

Sequence Diagram

sequenceDiagram
    actor User
    participant TUI as TUI Client
    participant Controller as Common Controller
    participant Service as Server<br/>(Evaluation Service)
    participant Orchestrator as Evaluation<br/>Orchestrator
    participant Agent as Evaluator Agent
    participant LLM as LLM Service<br/>(litellm)

    User->>TUI: Configure LLM (Bedrock credentials)
    TUI->>TUI: Store AWS credentials in config<br/>(access key, secret key, region)
    
    User->>Controller: Start Evaluation
    Controller->>Controller: Extract credentials from config<br/>based on provider type<br/>(Bedrock vs. API key)
    Controller->>Service: Call run_job with<br/>AgentConfig (credentials, mode)
    
    Service->>Orchestrator: Initialize with evaluation_mode,<br/>AWS credentials, owasp_categories
    
    alt evaluation_mode = RED_TEAM
        Orchestrator->>Orchestrator: Generate red-team scenarios<br/>via RedTeamScenarioGenerator
        rect rgb(200, 150, 255)
            note over Orchestrator: New red-team path
        end
    else evaluation_mode = POLICY
        Orchestrator->>Orchestrator: Use provided scenarios
    end
    
    Orchestrator->>Agent: Create agent with<br/>judge_llm AWS credentials,<br/>evaluation_mode, owasp_categories
    Agent->>Agent: Evaluate policy/scenarios
    
    alt provider = Bedrock
        Agent->>LLM: completion call with<br/>aws_access_key_id,<br/>aws_secret_access_key
        rect rgb(200, 150, 255)
            note over LLM: Bedrock credentials<br/>(not API key)
        end
    else provider = other
        Agent->>LLM: completion call with<br/>api_key parameter
    end
    
    LLM-->>Agent: LLM response
    Agent-->>Orchestrator: Evaluation results
    Orchestrator-->>Service: Final results<br/>(with red_teaming_results if applicable)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Areas requiring extra attention:

  • Verify all RogueSDK method call sites updated with new AWS credential parameters (interview.py, commands.go, interview_controller.go)
  • Confirm credential extraction logic handles both Bedrock and non-Bedrock providers correctly across all controllers
  • Validate that red-team scenario generation and mode switching in EvaluationOrchestrator correctly default and propagate when config fields are absent
  • Check JSON serialization of new optional fields (omitempty tags applied consistently across Go and Python types)
  • Verify commented-out aws_region parameters are intentional and documented, or if they should be fully integrated
  • Ensure LLMConfigDialog's ActiveInputField state machine handles all Bedrock field transitions and focus shifts correctly

Possibly related PRs

Suggested reviewers

  • yuval-qf
  • osher-qualifire

Poem

🐰 Hop, hop, the credentials now flow,
Through Bedrock paths where AWS winds blow,
Red teams and policies dance in new modes,
Fields multiplied on each scenic road,
The evaluator grins—more providers to go! 🌟

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete. While it contains a long list of commit messages from the branch history, the template sections (Motivation and Context, Type of Change, specific Changes Made, Testing, Checklist) are present but mostly unfilled with actual content relevant to this PR. Provide a clear summary of changes, explain the motivation for adding Bedrock support, mark the appropriate Type of Change checkbox, list key changes made, include testing steps, and check relevant checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feature/support bedrock' accurately describes the primary objective of adding Bedrock support across the codebase, covering both frontend (TUI) and backend components.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/support-bedrock

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (5)
sdks/python/rogue_sdk/types.py (4)

120-122: Consider using SecretStr for AWS credentials to prevent accidental exposure.

AWS credentials are sensitive secrets that should be protected from accidental logging or exposure. While the existing judge_llm_api_key field (line 119) also uses Optional[str], best practice would be to use Pydantic's SecretStr type for all credential fields, which is already imported in this file.

Apply this diff to enhance security:

-    judge_llm_aws_access_key_id: Optional[str] = None
-    judge_llm_aws_secret_access_key: Optional[str] = None
-    judge_llm_aws_region: Optional[str] = None
+    judge_llm_aws_access_key_id: Optional[SecretStr] = None
+    judge_llm_aws_secret_access_key: Optional[SecretStr] = None
+    judge_llm_aws_region: Optional[str] = None

Note: Region is not a secret, so it can remain as Optional[str]. You would need to update all code that accesses these fields to use .get_secret_value() similar to how get_auth_header() handles credentials (lines 35-38).


378-380: Consider using SecretStr for AWS credentials in StartInterviewRequest.

Same security recommendation as for AgentConfig: AWS access keys and secret keys should be protected using SecretStr to prevent accidental exposure through logs or error messages.


472-474: Consider using SecretStr for AWS credentials in ScenarioGenerationRequest.

Same security recommendation: AWS credentials should use SecretStr type for protection.


491-493: Consider using SecretStr for AWS credentials in SummaryGenerationRequest.

Same security recommendation: AWS credentials should use SecretStr type for protection.

packages/tui/internal/tui/common_controller.go (1)

203-209: Consider extracting duplicate model-prefix logic to a helper function.

The logic for checking if a model string contains "/" and conditionally prefixing it with the provider appears twice (lines 203-209 and lines 298-304). This duplication could lead to inconsistencies if the logic needs to be updated.

Consider creating a helper function:

// formatModelWithProvider returns the model string with provider prefix if not already present
func formatModelWithProvider(model, provider string) string {
    if strings.Contains(model, "/") {
        return model
    }
    return provider + "/" + model
}

Then use it in both locations:

judgeModel = formatModelWithProvider(m.config.SelectedModel, m.config.SelectedProvider)

and

m.evalState.JudgeModel = formatModelWithProvider(msg.Model, msg.Provider)

Also applies to: 298-304

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c59405 and 6ec2961.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (21)
  • VERSION (1 hunks)
  • packages/tui/internal/components/llm_config_dialog.go (16 hunks)
  • packages/tui/internal/tui/commands.go (2 hunks)
  • packages/tui/internal/tui/common_controller.go (2 hunks)
  • packages/tui/internal/tui/interview_controller.go (4 hunks)
  • packages/tui/internal/tui/interview_utils.go (2 hunks)
  • packages/tui/internal/tui/keyboard_controller.go (2 hunks)
  • packages/tui/internal/tui/scenario_utils.go (1 hunks)
  • packages/tui/internal/tui/utils.go (4 hunks)
  • pyproject.toml (1 hunks)
  • rogue/evaluator_agent/base_evaluator_agent.py (3 hunks)
  • rogue/evaluator_agent/evaluator_agent_factory.py (2 hunks)
  • rogue/evaluator_agent/policy_evaluation.py (2 hunks)
  • rogue/evaluator_agent/run_evaluator_agent.py (2 hunks)
  • rogue/server/api/interview.py (1 hunks)
  • rogue/server/api/llm.py (3 hunks)
  • rogue/server/core/evaluation_orchestrator.py (3 hunks)
  • rogue/server/services/evaluation_service.py (1 hunks)
  • rogue/server/services/interviewer_service.py (2 hunks)
  • rogue/server/services/llm_service.py (5 hunks)
  • sdks/python/rogue_sdk/types.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format Python code with Black
Ensure code passes flake8 linting
Run mypy with the repository configuration for static typing
Run Bandit security checks using .bandit.yaml configuration
Use isort import conventions for import ordering
Add type hints to all function signatures
Follow PEP 8 naming (snake_case for variables/functions, PascalCase for classes)
Use try/except around code that may raise exceptions

Files:

  • rogue/server/core/evaluation_orchestrator.py
  • rogue/server/services/evaluation_service.py
  • rogue/evaluator_agent/run_evaluator_agent.py
  • rogue/server/services/llm_service.py
  • sdks/python/rogue_sdk/types.py
  • rogue/evaluator_agent/evaluator_agent_factory.py
  • rogue/evaluator_agent/policy_evaluation.py
  • rogue/server/services/interviewer_service.py
  • rogue/evaluator_agent/base_evaluator_agent.py
  • rogue/server/api/interview.py
  • rogue/server/api/llm.py
pyproject.toml

📄 CodeRabbit inference engine (AGENTS.md)

Manage dependencies with uv and declare them in pyproject.toml

Files:

  • pyproject.toml
🧬 Code graph analysis (6)
packages/tui/internal/tui/common_controller.go (1)
packages/tui/internal/tui/types.go (2)
  • NewEvaluationScreen (37-37)
  • Model (51-90)
packages/tui/internal/tui/interview_utils.go (2)
packages/tui/internal/tui/utils.go (1)
  • RogueSDK (79-83)
sdks/python/rogue_sdk/types.py (2)
  • StartInterviewResponse (383-388)
  • StartInterviewRequest (373-380)
packages/tui/internal/tui/utils.go (1)
sdks/python/rogue_sdk/types.py (5)
  • Protocol (68-79)
  • Transport (82-93)
  • AuthType (23-48)
  • EvaluationRequest (419-425)
  • AgentConfig (104-139)
packages/tui/internal/components/llm_config_dialog.go (3)
packages/tui/internal/tui/types.go (1)
  • Model (51-90)
packages/tui/internal/theme/theme.go (1)
  • Theme (10-78)
packages/tui/internal/styles/styles.go (4)
  • NewStyle (45-47)
  • Primary (13-13)
  • TextMuted (22-22)
  • Border (23-23)
packages/tui/internal/tui/interview_controller.go (2)
packages/tui/internal/screens/scenarios/scenario_types.go (1)
  • InterviewStartedMsg (40-44)
packages/tui/internal/tui/scenario_utils.go (1)
  • ScenarioGenerationRequest (26-34)
rogue/server/api/llm.py (2)
sdks/python/rogue_sdk/types.py (5)
  • EvaluationResults (240-256)
  • ReportSummaryRequest (558-567)
  • ReportSummaryResponse (570-573)
  • ScenarioGenerationRequest (466-475)
  • SummaryGenerationRequest (485-498)
rogue/server/models/api_format.py (1)
  • ServerSummaryGenerationResponse (52-56)
🔇 Additional comments (10)
pyproject.toml (1)

10-10: boto3 1.40.69 is secure and suitable for Bedrock support.

The dependency is correctly placed and formatted. Verification confirms: (1) no known security vulnerabilities in boto3 1.40.69, and (2) AWS Bedrock API operations are supported in that version. The version constraint pattern (>=1.40.69) is consistent with other dependencies in the project.

rogue/server/core/evaluation_orchestrator.py (1)

34-36: LGTM: AWS credentials correctly threaded through orchestrator.

The implementation correctly accepts, stores, and forwards AWS credentials to the evaluator agent. The pattern is consistent and maintains the optional nature of these parameters.

Also applies to: 45-47, 99-101

packages/tui/internal/tui/common_controller.go (1)

284-294: LGTM: Bedrock credentials stored separately with clear naming.

The implementation correctly stores Bedrock AWS credentials in separate map keys (bedrock_access_key, bedrock_secret_key, bedrock_region) rather than mixing them with provider API keys. This separation is clear and maintainable.

packages/tui/internal/tui/commands.go (2)

52-76: LGTM: Clean provider-based credential handling.

The implementation correctly distinguishes between Bedrock (which uses AWS credentials) and other providers (which use API keys). The comments are clear, and the credential extraction logic is consistent with how credentials are stored in common_controller.go.


93-95: LGTM: AWS credentials correctly passed to GenerateSummary.

The AWS credential pointers are properly passed to the SDK's GenerateSummary method, maintaining the optional nature of these parameters.

rogue/evaluator_agent/run_evaluator_agent.py (1)

86-88: LGTM: AWS credentials correctly threaded through to evaluator agent.

The implementation correctly accepts AWS credentials as optional parameters and forwards them to get_evaluator_agent. The pattern is consistent with the broader credential propagation strategy.

Also applies to: 124-126

rogue/server/services/evaluation_service.py (1)

108-110: LGTM: AWS credentials correctly extracted from config and passed to orchestrator.

The implementation correctly extracts AWS credentials from the agent_config and forwards them to the EvaluationOrchestrator, completing the credential flow from the API request to the evaluation execution. Line 109 is appropriately marked with noqa: E501 for the long line.

packages/tui/internal/tui/interview_controller.go (1)

32-150: Nice provider-specific credential handling

The provider split plus Bedrock credential extraction keeps API keys and AWS creds separated cleanly. Passing them through StartInterview/GenerateScenarios lines everything up with the server changes. Looks solid.

rogue/evaluator_agent/base_evaluator_agent.py (1)

157-359: Good job threading judge AWS credentials

Capturing the judge’s AWS access key/secret/region at init and forwarding them into evaluate_policy completes the Bedrock path for evaluator runs. The change slots in cleanly with existing auth handling.

packages/tui/internal/tui/interview_utils.go (1)

21-97: SDK request matches the server contract

Adding the AWS fields to StartInterviewRequest and threading them through StartInterview keeps the TUI in sync with the backend/SDK types. The omitempty tags avoid clutter when credentials aren’t needed. Nicely done.

@drorIvry drorIvry enabled auto-merge (squash) November 13, 2025 17:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
rogue/evaluator_agent/run_evaluator_agent.py (1)

82-98: Expose new AWS and red‑team parameters through run_evaluator_agent as well

arun_evaluator_agent correctly accepts and forwards judge_llm_aws_*, evaluation_mode, and owasp_categories into get_evaluator_agent, but the synchronous wrapper run_evaluator_agent neither accepts nor passes these parameters. This means:

  • Sync callers cannot use Bedrock judge LLM or RED_TEAM mode.
  • If any call sites were updated to pass these kwargs to run_evaluator_agent, they will raise TypeError.

Extending the sync wrapper keeps both paths aligned and preserves backward compatibility via defaults.

-def run_evaluator_agent(
-    protocol: Protocol,
-    transport: Transport | None,
-    evaluated_agent_url: str,
-    auth_type: AuthType,
-    auth_credentials: str | None,
-    judge_llm: str,
-    judge_llm_api_key: str | None,
-    scenarios: Scenarios,
-    business_context: str,
-    deep_test_mode: bool,
-) -> EvaluationResults:
+def run_evaluator_agent(
+    protocol: Protocol,
+    transport: Transport | None,
+    evaluated_agent_url: str,
+    auth_type: AuthType,
+    auth_credentials: str | None,
+    judge_llm: str,
+    judge_llm_api_key: str | None,
+    scenarios: Scenarios,
+    business_context: str,
+    deep_test_mode: bool,
+    judge_llm_aws_access_key_id: str | None = None,
+    judge_llm_aws_secret_access_key: str | None = None,
+    judge_llm_aws_region: str | None = None,
+    evaluation_mode: EvaluationMode = EvaluationMode.POLICY,
+    owasp_categories: Optional[List[str]] = None,
+) -> EvaluationResults:
     async def run_evaluator_agent_task():
         async for update_type, data in arun_evaluator_agent(
             protocol=protocol,
             transport=transport,
             evaluated_agent_url=evaluated_agent_url,
             auth_type=auth_type,
             auth_credentials=auth_credentials,
             judge_llm=judge_llm,
             judge_llm_api_key=judge_llm_api_key,
             scenarios=scenarios,
             business_context=business_context,
             deep_test_mode=deep_test_mode,
+            judge_llm_aws_access_key_id=judge_llm_aws_access_key_id,
+            judge_llm_aws_secret_access_key=judge_llm_aws_secret_access_key,
+            judge_llm_aws_region=judge_llm_aws_region,
+            evaluation_mode=evaluation_mode,
+            owasp_categories=owasp_categories,
         ):
             if update_type == "results":
                 return data

Also applies to: 249-261

rogue/server/services/evaluation_service.py (1)

169-177: Red‑team and OWASP results from EvaluationResults are currently discarded

After orchestrator.run_evaluation() completes, the service only persists final_results.results onto job.results:

if final_results and final_results.results:
    job.results = final_results.results

But EvaluationResults now also carries:

  • red_teaming_results
  • owasp_summary

These are not stored on the job or sent via WebSocket updates, so callers using the job API cannot access red‑team outcomes or OWASP summaries, even when running in RED_TEAM mode.

To expose the new functionality end‑to‑end, you’ll likely need to:

  • Extend EvaluationJob (and any related DTOs / websocket payloads) to include red_teaming_results and owasp_summary, or
  • Store the full EvaluationResults object on the job and adapt consumers to read from it.
rogue/server/core/evaluation_orchestrator.py (1)

134-151: Preserve red‑teaming metadata when aggregating EvaluationResults

Right now, the "results" branch only loops over results.results and calls self.results.add_result(res). Any data in red_teaming_results (and potentially owasp_summary) on the EvaluationResults instance returned by arun_evaluator_agent is dropped, which undermines the new red-team support.

EvaluationResults already has a combine method that merges both standard and red-teaming results. Using it here keeps the aggregation logic in one place.

You can replace the manual loop with:

-                if update_type == "results":
-                    # Process results
-                    results = data
-                    if results and results.results:
-                        self.logger.info(
-                            f"📊 Processing {len(results.results)} evaluation results",
-                        )
-                        for res in results.results:
-                            self.results.add_result(res)
-                    else:
-                        self.logger.warning(
-                            "⚠️ Received results update but no results data",
-                        )
-
-                    # Yield the accumulated results
-                    yield "results", self.results
+                if update_type == "results":
+                    # Process results (policy + red-teaming metadata)
+                    results: EvaluationResults = data
+                    if results.results:
+                        self.logger.info(
+                            f"📊 Processing {len(results.results)} evaluation results",
+                        )
+                    else:
+                        self.logger.warning(
+                            "⚠️ Received results update but no results data",
+                        )
+
+                    self.results.combine(results)
+
+                    # Yield the accumulated results
+                    yield "results", self.results

This way, both standard and red-team evaluation data remain intact through orchestration.

Also applies to: 162-177

🧹 Nitpick comments (26)
rogue/__main__.py (1)

17-18: Eliminate duplicate imports.

The sys and Path imports here are redundant since they're already imported at lines 3 and 6. The fallback block can use the existing imports.

Apply this diff:

 try:
     from . import __version__
 except ImportError:
     # Fallback if running directly
     # Add parent directory to path
-    import sys  # noqa: F811
-    from pathlib import Path  # noqa: F811
-
     sys.path.insert(0, str(Path(__file__).parent.parent))
     from rogue import __version__  # noqa: F401
TESTING_WITH_CURL.md (1)

181-182: Format URLs per markdown best practices.

Wrap the bare URLs in angle brackets to comply with markdown conventions.

Apply this diff:

-- **Swagger UI**: http://localhost:8000/docs
-- **ReDoc**: http://localhost:8000/redoc
+- **Swagger UI**: <http://localhost:8000/docs>
+- **ReDoc**: <http://localhost:8000/redoc>
rogue/evaluator_agent/base_evaluator_agent.py (1)

472-494: Consider extracting OWASP normalization logic.

The scenario_type normalization logic for OWASP categories is complex and could benefit from extraction to a dedicated helper method for improved readability and testability.

Consider extracting to a method like:

def _normalize_scenario_type_and_outcome(
    self, 
    scenario_dict: dict[str, str]
) -> tuple[str, str]:
    """
    Normalize scenario_type and expected_outcome for OWASP categories.
    
    Returns:
        Tuple of (scenario_type, expected_outcome)
    """
    scenario_type = scenario_dict.get("scenario_type", "policy")
    if scenario_type not in [st.value for st in ScenarioType]:
        # OWASP category ID handling
        scenario_type = ScenarioType.POLICY.value
        expected_outcome = scenario_dict.get("expected_outcome", "")
        owasp_cat = scenario_dict.get("scenario_type")
        if owasp_cat and isinstance(owasp_cat, str) and owasp_cat not in expected_outcome:
            if expected_outcome:
                expected_outcome = f"{expected_outcome} (OWASP: {owasp_cat})"
            else:
                expected_outcome = f"OWASP Category: {owasp_cat}"
    else:
        expected_outcome = scenario_dict.get("expected_outcome") or ""
    
    return scenario_type, expected_outcome
test_red_team_curl.sh (1)

28-33: Improve error handling for job_id extraction.

The current implementation uses Python one-liner that can fail with json.JSONDecodeError or KeyError if the response is malformed or missing the job_id field. While line 30 checks for empty JOB_ID, the error message doesn't indicate whether the issue was invalid JSON or a missing key.

Apply this diff for more robust error handling:

-JOB_ID=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['job_id'])")
+JOB_ID=$(echo "$RESPONSE" | python3 -c "
+import sys, json
+try:
+    data = json.load(sys.stdin)
+    print(data.get('job_id', ''))
+except (json.JSONDecodeError, KeyError) as e:
+    print('', file=sys.stderr)
+    sys.exit(0)
+" 2>/dev/null)

 if [ -z "$JOB_ID" ]; then
     echo "❌ Failed to create job"
+    echo "Response was:"
     echo "$RESPONSE" | python3 -m json.tool 2>/dev/null || echo "$RESPONSE"
     exit 1
 fi
rogue/server/services/evaluation_library.py (2)

75-89: Consider using direct attribute access instead of getattr.

According to the AgentConfig schema in rogue_sdk/types.py, the fields evaluation_mode, owasp_categories, and attacks_per_category are already defined with default values. Using getattr with defaults suggests uncertainty about the schema and could mask typos or version mismatches.

Apply this diff to use direct attribute access:

-            evaluation_mode=getattr(
-                agent_config,
-                "evaluation_mode",
-                EvaluationMode.POLICY,
-            ),
-            owasp_categories=getattr(
-                agent_config,
-                "owasp_categories",
-                None,
-            ),
-            attacks_per_category=getattr(
-                agent_config,
-                "attacks_per_category",
-                5,
-            ),
+            evaluation_mode=agent_config.evaluation_mode,
+            owasp_categories=agent_config.owasp_categories,
+            attacks_per_category=agent_config.attacks_per_category,

If backward compatibility with older AgentConfig versions is required, document this and add a comment explaining the use of getattr.


62-90: Extract ScenarioEvaluationService construction to reduce duplication.

The ScenarioEvaluationService initialization logic is duplicated between evaluate_agent and evaluate_agent_streaming methods. This violates the DRY principle and increases maintenance burden.

Add a helper method to construct the service:

@staticmethod
def _create_evaluation_service(
    agent_config: AgentConfig,
    scenarios: Scenarios,
    business_context: str,
) -> ScenarioEvaluationService:
    """Create a ScenarioEvaluationService from agent config."""
    return ScenarioEvaluationService(
        protocol=agent_config.protocol,
        transport=agent_config.transport,
        evaluated_agent_url=str(agent_config.evaluated_agent_url),
        evaluated_agent_auth_type=agent_config.evaluated_agent_auth_type,
        evaluated_agent_auth_credentials=agent_config.evaluated_agent_credentials,
        judge_llm=agent_config.judge_llm,
        judge_llm_api_key=agent_config.judge_llm_api_key,
        scenarios=scenarios,
        business_context=business_context,
        deep_test_mode=agent_config.deep_test_mode,
        evaluation_mode=agent_config.evaluation_mode,
        owasp_categories=agent_config.owasp_categories,
        attacks_per_category=agent_config.attacks_per_category,
    )

Then update both methods to use it:

service = EvaluationLibrary._create_evaluation_service(
    agent_config=agent_config,
    scenarios=scenarios,
    business_context=business_context,
)

Also applies to: 170-196

rogue/server/red_teaming/vulnerabilities/robustness.py (1)

42-50: Simplify and clarify type conversion logic.

The current type conversion logic has redundant checks and silently drops invalid input. Line 47 handles both string and non-string cases, but line 49 filters to keep only valid strings, meaning non-strings are silently ignored. This makes the logic confusing and could hide errors.

Apply this diff to simplify and make the behavior explicit:

         if types is None:
             enum_types = list(RobustnessType)
         else:
             # Convert string values to enum types
             enum_types = [
-                RobustnessType(t) if isinstance(t, str) else t
+                RobustnessType(t)
                 for t in types
-                if isinstance(t, str) and t in [e.value for e in RobustnessType]
+                if t in [e.value for e in RobustnessType]
             ]

         super().__init__(types=enum_types)  # type: ignore[arg-type]

Alternatively, add explicit validation to raise an error for invalid types:

if types is None:
    enum_types = list(RobustnessType)
else:
    enum_types = []
    valid_values = [e.value for e in RobustnessType]
    for t in types:
        if isinstance(t, str):
            if t in valid_values:
                enum_types.append(RobustnessType(t))
            else:
                raise ValueError(f"Invalid RobustnessType: {t}")
        elif isinstance(t, RobustnessType):
            enum_types.append(t)
        else:
            raise TypeError(f"Expected str or RobustnessType, got {type(t)}")
test_server_red_team.sh (2)

45-51: Use jq or python for robust JSON parsing.

The script uses grep and cut to extract JSON fields, which is fragile and can break if the JSON formatting changes (e.g., whitespace, field order). This is similar to the issue in test_red_team_curl.sh.

Apply this diff to use jq for robust JSON parsing:

-JOB_ID=$(echo "$RESPONSE" | grep -o '"job_id":"[^"]*' | cut -d'"' -f4)
+JOB_ID=$(echo "$RESPONSE" | jq -r '.job_id // empty')

 if [ -z "$JOB_ID" ]; then
     echo "❌ Failed to create evaluation job"
-    echo "Response: $RESPONSE"
+    echo "Response:"
+    echo "$RESPONSE" | jq . 2>/dev/null || echo "$RESPONSE"
     exit 1
 fi

Or if jq is not available, use Python:

-JOB_ID=$(echo "$RESPONSE" | grep -o '"job_id":"[^"]*' | cut -d'"' -f4)
+JOB_ID=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin).get('job_id', ''))" 2>/dev/null)

61-61: Use jq or python for status extraction.

Same fragile grep pattern is used to extract the status field.

Apply this diff:

-    STATUS=$(curl -s "$SERVER_URL/api/v1/evaluations/$JOB_ID" | grep -o '"status":"[^"]*' | cut -d'"' -f4)
+    STATUS=$(curl -s "$SERVER_URL/api/v1/evaluations/$JOB_ID" | jq -r '.status // "unknown"')

Or with Python:

-    STATUS=$(curl -s "$SERVER_URL/api/v1/evaluations/$JOB_ID" | grep -o '"status":"[^"]*' | cut -d'"' -f4)
+    STATUS=$(curl -s "$SERVER_URL/api/v1/evaluations/$JOB_ID" | python3 -c "import sys, json; print(json.load(sys.stdin).get('status', 'unknown'))" 2>/dev/null)
rogue/server/red_teaming/metrics/base_red_teaming_metric.py (1)

32-39: Remove unnecessary return statement.

The measure method signature indicates it returns None, so the return statement in a_measure is unnecessary and could be confusing.

Apply this diff:

     async def a_measure(self, test_case: Any) -> None:
         """
         Async version of measure.

         Args:
             test_case: Test case containing attack input and agent response
         """
-        return self.measure(test_case)
+        self.measure(test_case)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (1)

18-36: Consider moving leet_map to a class-level constant.

The leet_map dictionary is recreated on every call to enhance(), but it's a constant mapping. Moving it to a class-level constant improves efficiency and makes the code more maintainable.

Apply this diff to refactor:

 class Leetspeak(BaseSingleTurnAttack):
     """Leetspeak transformation attack."""

     name = "Leetspeak"
+    
+    _LEET_MAP = {
+        "a": "4",
+        "e": "3",
+        "i": "1",
+        "o": "0",
+        "s": "5",
+        "t": "7",
+        "l": "1",
+        "A": "4",
+        "E": "3",
+        "I": "1",
+        "O": "0",
+        "S": "5",
+        "T": "7",
+        "L": "1",
+    }

     def __init__(self, weight: int = 1):
         self.weight = weight

     def enhance(self, attack: str) -> str:
         """Enhance the attack using Leetspeak transformation."""
-        leet_map = {
-            "a": "4",
-            "e": "3",
-            "i": "1",
-            "o": "0",
-            "s": "5",
-            "t": "7",
-            "l": "1",
-            "A": "4",
-            "E": "3",
-            "I": "1",
-            "O": "0",
-            "S": "5",
-            "T": "7",
-            "L": "1",
-        }
-        return "".join(leet_map.get(char, char) for char in attack)
+        return "".join(self._LEET_MAP.get(char, char) for char in attack)
rogue/server/red_teaming/attacks/single_turn/rot13.py (1)

18-29: Consider moving the translation table to a class-level constant.

The ROT13 translation table is recreated on every call to enhance(), but it's a constant mapping. Moving it to a class-level constant improves efficiency.

Apply this diff to refactor:

 class ROT13(BaseSingleTurnAttack):
     """ROT13 encoding attack."""

     name = "ROT-13"
+    
+    _ROT13_TRANS = str.maketrans(
+        "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
+        "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm",
+    )

     def __init__(self, weight: int = 1):
         self.weight = weight

     def enhance(self, attack: str) -> str:
         """Enhance the attack using ROT13 encoding."""
-        return attack.translate(
-            str.maketrans(
-                "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
-                "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm",
-            ),
-        )
+        return attack.translate(self._ROT13_TRANS)
rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (1)

20-39: Add return type annotation on __init__ to match typing guidelines

__init__ currently lacks a return type, while other methods are annotated. To keep mypy/typing consistent with the rest of the codebase and the stated guidelines, annotate it explicitly as returning None.

-    def __init__(self, weight: int = 1):
+    def __init__(self, weight: int = 1) -> None:
         self.weight = weight
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (1)

17-27: Annotate __init__ return type and optionally avoid precomputing both variants

Two small points here:

  1. __init__ should be annotated to return None to keep function signatures fully typed:
-    def __init__(self, weight: int = 1):
+    def __init__(self, weight: int = 1) -> None:
         self.weight = weight
  1. To avoid constructing both enhanced prompts on every call (even though it's cheap here), you can randomize at the function level instead of over precomputed strings:
-    def enhance(self, attack: str) -> str:
-        """Enhance the attack with prompt injection techniques."""
-        return random.choice(  # nosec B311
-            [
-                self._enhance_1(attack),
-                self._enhance_2(attack),
-            ],
-        )
+    def enhance(self, attack: str) -> str:
+        """Enhance the attack with prompt injection techniques."""
+        enhancer = random.choice(  # nosec B311
+            [self._enhance_1, self._enhance_2],
+        )
+        return enhancer(attack)
rogue/server/red_teaming/frameworks/owasp/owasp.py (1)

7-8: Simplify OWASPTop10 typing to avoid mypy ignores and add explicit __init__ return type

The current categories signature (List[Literal[...]] = None # type: ignore[assignment]) forces type-ignores here and at call sites (e.g., RedTeamScenarioGenerator) without much practical gain, since you already filter against OWASP_CATEGORIES.

You can make this cleaner and align with the “type hints on all function signatures” guideline by:

  • Using Optional[List[str]] for categories (you still do runtime validation/filtering).
  • Adding -> None to __init__.
  • Dropping the now-unneeded Literal import and # type: ignore[assignment].

For example:

-from dataclasses import dataclass
-from typing import List, Literal
+from dataclasses import dataclass
+from typing import List, Optional
@@
-    def __init__(
-        self,
-        categories: List[
-            Literal[
-                "LLM_01",
-                "LLM_02",
-                "LLM_03",
-                "LLM_04",
-                "LLM_05",
-                "LLM_06",
-                "LLM_07",
-                "LLM_08",
-                "LLM_09",
-                "LLM_10",
-            ]
-        ] = None,  # type: ignore[assignment]
-    ):
+    def __init__(
+        self,
+        categories: Optional[List[str]] = None,
+    ) -> None:
@@
-        self.categories = categories
+        self.categories = categories

This also allows you to remove the # type: ignore[arg-type] at the call site in RedTeamScenarioGenerator.

Also applies to: 37-53, 62-74

rogue/server/red_teaming/frameworks/owasp/risk_categories.py (1)

22-37: OWASP category definitions align with attack/vulnerability types

The OWASPCategory dataclass and the three initial entries in OWASP_CATEGORIES look consistent:

  • Attack lists use the expected single‑turn attack classes with sensible weights/personas.
  • Vulnerability types strings match the Enum .values defined in ExcessiveAgencyType, PromptLeakageType, and RobustnessType, so the filtering logic in those ctors will include them.

One thing to keep in mind: the vulnerability ctors currently only include string values (Enum instances are effectively ignored). That’s fine given this module passes strings, but if you later start passing Enums directly, you’ll need to tweak those comprehensions to handle non‑string items.

Also applies to: 48-124

rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)

12-54: Annotate types on BaseVulnerability for clearer typing

The base class API is straightforward, but self.types is only introduced in __init__ without a class-level annotation. Adding one tightens type checking and makes the attribute contract explicit.

You can do:

 class BaseVulnerability(ABC):
@@
-    name: str
+    name: str
+    types: List[Enum]

No behavior change, just clearer intent for mypy and readers.

sdks/python/rogue_sdk/types.py (2)

127-147: Handle AWS secret access keys cautiously in SDK models

The new AWS fields (judge_llm_aws_secret_access_key, and the aws_secret_access_key fields on request models) are currently plain Optional[str], similar to judge_llm_api_key.

That’s workable, but to reduce the risk of accidental logging or serialization of secrets, you might consider:

  • Using SecretStr for the secret access key fields in AgentConfig and request models, and
  • Ensuring any logging of these models uses filtered dumps (or omits these fields).

If you keep them as str, it’d be good to double‑check that no code path logs model_dump() of these models without redaction.

Also applies to: 432-434, 526-528, 545-547


262-293: Ensure new red‑teaming fields flow through downstream APIs

RedTeamingResult and the new red_teaming_results / owasp_summary fields on EvaluationResults give you a richer result surface, but:

  • EvaluationResults.combine only merges results and red_teaming_resultsowasp_summary is left untouched. If you plan to aggregate summaries across batches, you’ll need explicit merge semantics there (or document that owasp_summary is per‑run and should not be combined).
  • convert_to_api_format currently only looks at evaluation_results.results and ignores red_teaming_results and owasp_summary, so clients using the new API format won’t see any red‑team/OWASP information yet.

If the intent is to expose OWASP/red‑team data via the new API, you’ll likely want to extend ApiScenarioResult/ApiEvaluationResult and convert_to_api_format accordingly.

Also applies to: 348-405

rogue/server/services/red_team_scenario_generator.py (1)

26-29: Tighten typing for _owasp_framework and drop now‑unnecessary type ignore

To align with the “type hints on all function signatures” guideline and the simplified OWASPTop10 constructor:

  • Add an explicit return type for __init__.
  • Type _owasp_framework as OWASPTop10 | None.
  • Remove the # type: ignore[arg-type] now that OWASPTop10.categories accepts Optional[List[str]].

For example:

-    def __init__(self):
-        """Initialize the red team scenario generator."""
-        self._owasp_framework = None
+    def __init__(self) -> None:
+        """Initialize the red team scenario generator."""
+        self._owasp_framework: OWASPTop10 | None = None
@@
-        # Load OWASP framework with selected categories
-        self._owasp_framework = OWASPTop10(
-            categories=owasp_categories,  # type: ignore[arg-type]
-        )
+        # Load OWASP framework with selected categories
+        self._owasp_framework = OWASPTop10(
+            categories=owasp_categories,
+        )

This keeps mypy/flake8 happy without suppressions and documents the internal state more clearly.

Also applies to: 55-58

rogue/evaluator_agent/red_team_mcp_evaluator_agent.py (2)

161-170: OWASP framework never loads when owasp_categories is empty

Right now _get_owasp_framework only initializes OWASPTop10 if self._owasp_categories is truthy, and _select_and_enhance_attack also relies on self._owasp_categories for both extraction and random fallback. That means if callers omit owasp_categories (or pass an empty list), you never instantiate the OWASP framework and message enhancement is effectively disabled, despite the instructions saying “All available OWASP categories” and OWASPTop10 having its own default categories.

If the intended behavior is “use OWASP defaults when categories aren’t specified”, consider letting _get_owasp_framework call OWASPTop10 with categories=None and then deriving self._owasp_categories from the framework once, e.g. based on category.id.

Also applies to: 261-270


261-272: Add a return type hint to _get_owasp_framework for mypy clarity

_get_owasp_framework currently has no return type annotation. Given the rest of this module is typed, adding something like -> object | None or a forward type (e.g. "OWASPTop10" | None under TYPE_CHECKING) will keep mypy/flake8 happier and document expectations.

rogue/server/services/scenario_evaluation_service.py (1)

83-96: Use EvaluationResults.combine to preserve red-teaming metadata

When processing the "results" update, you currently only merge results.results via self._results.add_result(res) and ignore red_teaming_results and owasp_summary that are now part of EvaluationResults. This means red-team–specific data is dropped at this layer.

Given EvaluationResults already exposes combine, you can simplify the aggregation and keep all fields:

-                if update_type == "results":
-                    results = data
-                    if results and results.results:
-                        logger.info(
-                            f"📊 Processing {len(results.results)} evaluation results",
-                        )
-                        for res in results.results:
-                            self._results.add_result(res)
-                    else:
-                        logger.warning("⚠️ Received results update but no results data")
+                if update_type == "results":
+                    results = data
+                    if results:
+                        logger.info(
+                            f"📊 Processing {len(results.results) if results.results else 0} evaluation results",
+                        )
+                        self._results.combine(results)
+                    else:
+                        logger.warning("⚠️ Received results update but no results data")

This keeps prior behavior for standard results while also propagating red-team fields.

Also applies to: 127-137, 181-187

rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (3)

161-170: Same OWASP default behavior consideration as MCP variant

As with the MCP agent, _get_owasp_framework bails out when self._owasp_categories is empty, so you never instantiate the OWASP framework or enhance messages if callers don’t explicitly pass categories. If the intent is to fall back to OWASP’s default categories when none are provided, consider letting _get_owasp_framework call OWASPTop10 with categories=None and seeding self._owasp_categories from the resulting categories.

Also applies to: 261-270


261-272: Consider adding a return type hint to _get_owasp_framework

For consistency with the rest of the module’s typed methods, adding a return annotation (e.g. "OWASPTop10" | None) will help mypy and future readers.


113-168: Shared red-team logic between A2A and MCP could be factored

RedTeamA2AEvaluatorAgent and RedTeamMCPEvaluatorAgent share nearly identical OWASP helpers and attack-selection logic. To reduce drift and keep future changes (e.g., adding new OWASP categories or attack selection tweaks) in one place, consider extracting the common pieces into a small mixin or helper class that both agents can reuse.

Also applies to: 283-373

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d58185 and 7ea29b5.

📒 Files selected for processing (40)
  • TESTING_WITH_CURL.md (1 hunks)
  • rogue/__main__.py (1 hunks)
  • rogue/evaluator_agent/__init__.py (2 hunks)
  • rogue/evaluator_agent/base_evaluator_agent.py (4 hunks)
  • rogue/evaluator_agent/evaluator_agent_factory.py (2 hunks)
  • rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (1 hunks)
  • rogue/evaluator_agent/red_team_mcp_evaluator_agent.py (1 hunks)
  • rogue/evaluator_agent/run_evaluator_agent.py (3 hunks)
  • rogue/server/core/evaluation_orchestrator.py (5 hunks)
  • rogue/server/red_teaming/__init__.py (1 hunks)
  • rogue/server/red_teaming/attacks/__init__.py (1 hunks)
  • rogue/server/red_teaming/attacks/base_attack.py (1 hunks)
  • rogue/server/red_teaming/attacks/single_turn/__init__.py (1 hunks)
  • rogue/server/red_teaming/attacks/single_turn/base64.py (1 hunks)
  • rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1 hunks)
  • rogue/server/red_teaming/attacks/single_turn/leetspeak.py (1 hunks)
  • rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (1 hunks)
  • rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (1 hunks)
  • rogue/server/red_teaming/attacks/single_turn/roleplay.py (1 hunks)
  • rogue/server/red_teaming/attacks/single_turn/rot13.py (1 hunks)
  • rogue/server/red_teaming/frameworks/__init__.py (1 hunks)
  • rogue/server/red_teaming/frameworks/owasp/__init__.py (1 hunks)
  • rogue/server/red_teaming/frameworks/owasp/owasp.py (1 hunks)
  • rogue/server/red_teaming/frameworks/owasp/risk_categories.py (1 hunks)
  • rogue/server/red_teaming/metrics/__init__.py (1 hunks)
  • rogue/server/red_teaming/metrics/base_red_teaming_metric.py (1 hunks)
  • rogue/server/red_teaming/vulnerabilities/__init__.py (1 hunks)
  • rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1 hunks)
  • rogue/server/red_teaming/vulnerabilities/excessive_agency.py (1 hunks)
  • rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (1 hunks)
  • rogue/server/red_teaming/vulnerabilities/robustness.py (1 hunks)
  • rogue/server/services/evaluation_library.py (3 hunks)
  • rogue/server/services/evaluation_service.py (1 hunks)
  • rogue/server/services/red_team_scenario_generator.py (1 hunks)
  • rogue/server/services/scenario_evaluation_service.py (5 hunks)
  • sdks/python/rogue_sdk/types.py (7 hunks)
  • test_red_team_curl.sh (1 hunks)
  • test_red_teaming_foundation.py (1 hunks)
  • test_red_teaming_simple.py (1 hunks)
  • test_server_red_team.sh (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format Python code with Black
Ensure code passes flake8 linting
Run mypy with the repository configuration for static typing
Run Bandit security checks using .bandit.yaml configuration
Use isort import conventions for import ordering
Add type hints to all function signatures
Follow PEP 8 naming (snake_case for variables/functions, PascalCase for classes)
Use try/except around code that may raise exceptions

Files:

  • rogue/server/red_teaming/__init__.py
  • rogue/server/red_teaming/attacks/single_turn/base64.py
  • rogue/__main__.py
  • test_red_teaming_simple.py
  • rogue/server/red_teaming/metrics/__init__.py
  • rogue/server/services/evaluation_library.py
  • rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py
  • rogue/server/red_teaming/metrics/base_red_teaming_metric.py
  • rogue/server/red_teaming/vulnerabilities/base_vulnerability.py
  • rogue/server/red_teaming/attacks/single_turn/leetspeak.py
  • rogue/server/red_teaming/attacks/base_attack.py
  • test_red_teaming_foundation.py
  • rogue/server/red_teaming/attacks/single_turn/rot13.py
  • rogue/server/red_teaming/frameworks/owasp/__init__.py
  • rogue/server/red_teaming/attacks/__init__.py
  • rogue/server/red_teaming/attacks/single_turn/__init__.py
  • rogue/evaluator_agent/run_evaluator_agent.py
  • rogue/server/services/red_team_scenario_generator.py
  • rogue/server/red_teaming/vulnerabilities/excessive_agency.py
  • rogue/server/red_teaming/attacks/single_turn/prompt_injection.py
  • rogue/server/red_teaming/frameworks/owasp/owasp.py
  • rogue/server/red_teaming/frameworks/owasp/risk_categories.py
  • sdks/python/rogue_sdk/types.py
  • rogue/evaluator_agent/base_evaluator_agent.py
  • rogue/server/red_teaming/frameworks/__init__.py
  • rogue/server/red_teaming/attacks/single_turn/prompt_probing.py
  • rogue/server/red_teaming/vulnerabilities/prompt_leakage.py
  • rogue/evaluator_agent/__init__.py
  • rogue/server/services/evaluation_service.py
  • rogue/evaluator_agent/evaluator_agent_factory.py
  • rogue/server/red_teaming/vulnerabilities/__init__.py
  • rogue/server/red_teaming/vulnerabilities/robustness.py
  • rogue/server/red_teaming/attacks/single_turn/roleplay.py
  • rogue/server/services/scenario_evaluation_service.py
  • rogue/server/core/evaluation_orchestrator.py
  • rogue/evaluator_agent/red_team_a2a_evaluator_agent.py
  • rogue/evaluator_agent/red_team_mcp_evaluator_agent.py
🧬 Code graph analysis (31)
rogue/server/red_teaming/attacks/single_turn/base64.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
  • BaseSingleTurnAttack (10-13)
rogue/server/red_teaming/attacks/base_attack.py (3)
  • enhance (24-36)
  • a_enhance (38-50)
  • get_name (53-55)
test_red_teaming_simple.py (10)
rogue/server/red_teaming/attacks/single_turn/rot13.py (3)
  • ROT13 (10-32)
  • enhance (18-25)
  • get_name (31-32)
rogue/server/red_teaming/attacks/single_turn/base64.py (3)
  • Base64 (12-29)
  • enhance (20-22)
  • get_name (28-29)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (3)
  • Leetspeak (10-43)
  • enhance (18-36)
  • get_name (42-43)
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (3)
  • PromptInjection (12-60)
  • enhance (20-27)
  • get_name (59-60)
rogue/server/red_teaming/vulnerabilities/excessive_agency.py (1)
  • ExcessiveAgency (21-54)
rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (1)
  • PromptLeakage (22-56)
rogue/server/red_teaming/vulnerabilities/robustness.py (1)
  • Robustness (20-52)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (2)
  • get_types (30-37)
  • get_name (48-50)
rogue/server/red_teaming/frameworks/owasp/owasp.py (3)
  • OWASPTop10 (14-82)
  • get_name (76-78)
  • get_categories (80-82)
sdks/python/rogue_sdk/types.py (3)
  • EvaluationMode (58-62)
  • AgentConfig (111-161)
  • RedTeamingResult (262-279)
rogue/server/red_teaming/metrics/__init__.py (1)
rogue/server/red_teaming/metrics/base_red_teaming_metric.py (1)
  • BaseRedTeamingMetric (11-39)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
rogue/server/red_teaming/attacks/base_attack.py (1)
  • BaseAttack (11-55)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (3)
rogue/server/red_teaming/attacks/base_attack.py (1)
  • get_name (53-55)
rogue/server/red_teaming/attacks/single_turn/base64.py (1)
  • get_name (28-29)
rogue/server/red_teaming/frameworks/owasp/owasp.py (1)
  • get_name (76-78)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
  • BaseSingleTurnAttack (10-13)
rogue/server/red_teaming/attacks/base_attack.py (3)
  • enhance (24-36)
  • a_enhance (38-50)
  • get_name (53-55)
rogue/server/red_teaming/attacks/base_attack.py (8)
rogue/server/red_teaming/attacks/single_turn/base64.py (3)
  • enhance (20-22)
  • a_enhance (24-26)
  • get_name (28-29)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (3)
  • enhance (18-36)
  • a_enhance (38-40)
  • get_name (42-43)
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (3)
  • enhance (20-27)
  • a_enhance (29-31)
  • get_name (59-60)
rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (3)
  • enhance (23-32)
  • a_enhance (34-36)
  • get_name (38-39)
rogue/server/red_teaming/attacks/single_turn/roleplay.py (3)
  • enhance (33-42)
  • a_enhance (44-46)
  • get_name (48-49)
rogue/server/red_teaming/attacks/single_turn/rot13.py (3)
  • enhance (18-25)
  • a_enhance (27-29)
  • get_name (31-32)
rogue/server/red_teaming/frameworks/owasp/owasp.py (1)
  • get_name (76-78)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
  • get_name (48-50)
test_red_teaming_foundation.py (11)
rogue/server/red_teaming/attacks/single_turn/base64.py (3)
  • Base64 (12-29)
  • enhance (20-22)
  • get_name (28-29)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (3)
  • Leetspeak (10-43)
  • enhance (18-36)
  • get_name (42-43)
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (3)
  • PromptInjection (12-60)
  • enhance (20-27)
  • get_name (59-60)
rogue/server/red_teaming/attacks/single_turn/rot13.py (3)
  • ROT13 (10-32)
  • enhance (18-25)
  • get_name (31-32)
rogue/server/red_teaming/attacks/base_attack.py (2)
  • enhance (24-36)
  • get_name (53-55)
rogue/server/red_teaming/vulnerabilities/excessive_agency.py (1)
  • ExcessiveAgency (21-54)
rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (1)
  • PromptLeakage (22-56)
rogue/server/red_teaming/vulnerabilities/robustness.py (1)
  • Robustness (20-52)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (2)
  • get_types (30-37)
  • get_name (48-50)
rogue/server/red_teaming/frameworks/owasp/owasp.py (3)
  • OWASPTop10 (14-82)
  • get_name (76-78)
  • get_categories (80-82)
sdks/python/rogue_sdk/types.py (4)
  • EvaluationMode (58-62)
  • AgentConfig (111-161)
  • RedTeamingResult (262-279)
  • EvaluationResults (282-310)
rogue/server/red_teaming/attacks/single_turn/rot13.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
  • BaseSingleTurnAttack (10-13)
rogue/server/red_teaming/attacks/base_attack.py (3)
  • enhance (24-36)
  • a_enhance (38-50)
  • get_name (53-55)
rogue/server/red_teaming/frameworks/owasp/__init__.py (2)
rogue/server/red_teaming/frameworks/owasp/owasp.py (1)
  • OWASPTop10 (14-82)
rogue/server/red_teaming/frameworks/owasp/risk_categories.py (1)
  • OWASPCategory (23-37)
rogue/server/red_teaming/attacks/__init__.py (1)
rogue/server/red_teaming/attacks/base_attack.py (1)
  • BaseAttack (11-55)
rogue/server/red_teaming/attacks/single_turn/__init__.py (7)
rogue/server/red_teaming/attacks/single_turn/base64.py (1)
  • Base64 (12-29)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
  • BaseSingleTurnAttack (10-13)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (1)
  • Leetspeak (10-43)
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (1)
  • PromptInjection (12-60)
rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (1)
  • PromptProbing (10-39)
rogue/server/red_teaming/attacks/single_turn/roleplay.py (1)
  • Roleplay (12-49)
rogue/server/red_teaming/attacks/single_turn/rot13.py (1)
  • ROT13 (10-32)
rogue/evaluator_agent/run_evaluator_agent.py (1)
sdks/python/rogue_sdk/types.py (3)
  • EvaluationMode (58-62)
  • EvaluationResults (282-310)
  • Scenarios (206-224)
rogue/server/services/red_team_scenario_generator.py (3)
sdks/python/rogue_sdk/types.py (3)
  • Scenario (164-203)
  • Scenarios (206-224)
  • ScenarioType (51-55)
rogue/server/red_teaming/frameworks/owasp/owasp.py (2)
  • OWASPTop10 (14-82)
  • get_categories (80-82)
rogue/server/red_teaming/frameworks/owasp/risk_categories.py (1)
  • OWASPCategory (23-37)
rogue/server/red_teaming/vulnerabilities/excessive_agency.py (1)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
  • BaseVulnerability (12-54)
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
  • BaseSingleTurnAttack (10-13)
rogue/server/red_teaming/attacks/base_attack.py (3)
  • enhance (24-36)
  • a_enhance (38-50)
  • get_name (53-55)
rogue/server/red_teaming/frameworks/owasp/owasp.py (1)
rogue/server/red_teaming/frameworks/owasp/risk_categories.py (1)
  • OWASPCategory (23-37)
rogue/server/red_teaming/frameworks/owasp/risk_categories.py (11)
rogue/server/red_teaming/attacks/base_attack.py (1)
  • BaseAttack (11-55)
rogue/server/red_teaming/attacks/single_turn/rot13.py (1)
  • ROT13 (10-32)
rogue/server/red_teaming/attacks/single_turn/base64.py (1)
  • Base64 (12-29)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (1)
  • Leetspeak (10-43)
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (1)
  • PromptInjection (12-60)
rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (1)
  • PromptProbing (10-39)
rogue/server/red_teaming/attacks/single_turn/roleplay.py (1)
  • Roleplay (12-49)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
  • BaseVulnerability (12-54)
rogue/server/red_teaming/vulnerabilities/excessive_agency.py (1)
  • ExcessiveAgency (21-54)
rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (1)
  • PromptLeakage (22-56)
rogue/server/red_teaming/vulnerabilities/robustness.py (1)
  • Robustness (20-52)
sdks/python/rogue_sdk/types.py (1)
packages/sdk/src/types.ts (2)
  • ChatMessage (49-53)
  • EvaluationResult (61-65)
rogue/evaluator_agent/base_evaluator_agent.py (1)
sdks/python/rogue_sdk/types.py (2)
  • ScenarioType (51-55)
  • Scenario (164-203)
rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
  • BaseSingleTurnAttack (10-13)
rogue/server/red_teaming/attacks/base_attack.py (3)
  • enhance (24-36)
  • a_enhance (38-50)
  • get_name (53-55)
rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (1)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
  • BaseVulnerability (12-54)
rogue/evaluator_agent/__init__.py (2)
rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (1)
  • RedTeamA2AEvaluatorAgent (113-421)
rogue/evaluator_agent/red_team_mcp_evaluator_agent.py (1)
  • RedTeamMCPEvaluatorAgent (113-421)
rogue/evaluator_agent/evaluator_agent_factory.py (6)
sdks/python/rogue_sdk/types.py (4)
  • EvaluationMode (58-62)
  • Protocol (75-86)
  • Scenarios (206-224)
  • Transport (89-100)
rogue/evaluator_agent/a2a/a2a_evaluator_agent.py (1)
  • A2AEvaluatorAgent (19-213)
rogue/evaluator_agent/base_evaluator_agent.py (1)
  • BaseEvaluatorAgent (157-605)
rogue/evaluator_agent/mcp/mcp_evaluator_agent.py (1)
  • MCPEvaluatorAgent (14-181)
rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (1)
  • RedTeamA2AEvaluatorAgent (113-421)
rogue/evaluator_agent/red_team_mcp_evaluator_agent.py (1)
  • RedTeamMCPEvaluatorAgent (113-421)
rogue/server/red_teaming/vulnerabilities/__init__.py (4)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
  • BaseVulnerability (12-54)
rogue/server/red_teaming/vulnerabilities/excessive_agency.py (2)
  • ExcessiveAgency (21-54)
  • ExcessiveAgencyType (13-18)
rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (2)
  • PromptLeakage (22-56)
  • PromptLeakageType (13-19)
rogue/server/red_teaming/vulnerabilities/robustness.py (2)
  • Robustness (20-52)
  • RobustnessType (13-17)
rogue/server/red_teaming/vulnerabilities/robustness.py (1)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
  • BaseVulnerability (12-54)
rogue/server/red_teaming/attacks/single_turn/roleplay.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
  • BaseSingleTurnAttack (10-13)
rogue/server/red_teaming/attacks/base_attack.py (3)
  • enhance (24-36)
  • a_enhance (38-50)
  • get_name (53-55)
rogue/server/services/scenario_evaluation_service.py (3)
sdks/python/rogue_sdk/types.py (6)
  • AuthType (23-48)
  • EvaluationMode (58-62)
  • EvaluationResults (282-310)
  • Protocol (75-86)
  • Scenarios (206-224)
  • Transport (89-100)
rogue/evaluator_agent/run_evaluator_agent.py (2)
  • run_evaluator_agent (237-265)
  • arun_evaluator_agent (82-234)
rogue/server/services/red_team_scenario_generator.py (2)
  • RedTeamScenarioGenerator (18-187)
  • generate_scenarios (30-87)
rogue/server/core/evaluation_orchestrator.py (3)
sdks/python/rogue_sdk/types.py (6)
  • AuthType (23-48)
  • EvaluationMode (58-62)
  • EvaluationResults (282-310)
  • Protocol (75-86)
  • Scenarios (206-224)
  • Transport (89-100)
rogue/evaluator_agent/run_evaluator_agent.py (2)
  • run_evaluator_agent (237-265)
  • arun_evaluator_agent (82-234)
rogue/server/services/red_team_scenario_generator.py (2)
  • RedTeamScenarioGenerator (18-187)
  • generate_scenarios (30-87)
rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (12)
sdks/python/rogue_sdk/types.py (2)
  • Scenarios (206-224)
  • Transport (89-100)
rogue/common/agent_model_wrapper.py (1)
  • get_llm_from_model (11-35)
rogue/evaluator_agent/a2a/a2a_evaluator_agent.py (1)
  • A2AEvaluatorAgent (19-213)
rogue/evaluator_agent/base_evaluator_agent.py (8)
  • get_underlying_agent (197-253)
  • _get_conversation_context_id (560-566)
  • _send_message_to_evaluated_agent (540-557)
  • _log_evaluation (372-534)
  • _before_tool_callback (255-273)
  • _after_tool_callback (275-295)
  • _before_model_callback (297-310)
  • _after_model_callback (312-332)
rogue/server/red_teaming/frameworks/owasp/owasp.py (3)
  • OWASPTop10 (14-82)
  • get_categories (80-82)
  • get_name (76-78)
rogue/server/red_teaming/attacks/base_attack.py (2)
  • enhance (24-36)
  • get_name (53-55)
rogue/server/red_teaming/attacks/single_turn/base64.py (2)
  • enhance (20-22)
  • get_name (28-29)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (2)
  • enhance (18-36)
  • get_name (42-43)
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (2)
  • enhance (20-27)
  • get_name (59-60)
rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (2)
  • enhance (23-32)
  • get_name (38-39)
rogue/server/red_teaming/attacks/single_turn/roleplay.py (2)
  • enhance (33-42)
  • get_name (48-49)
rogue/server/red_teaming/attacks/single_turn/rot13.py (2)
  • enhance (18-25)
  • get_name (31-32)
rogue/evaluator_agent/red_team_mcp_evaluator_agent.py (13)
sdks/python/rogue_sdk/types.py (2)
  • Scenarios (206-224)
  • Transport (89-100)
rogue/common/agent_model_wrapper.py (1)
  • get_llm_from_model (11-35)
rogue/evaluator_agent/mcp/mcp_evaluator_agent.py (1)
  • MCPEvaluatorAgent (14-181)
rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (8)
  • get_underlying_agent (169-227)
  • _get_owasp_category_description (229-243)
  • _send_message_to_evaluated_agent (375-393)
  • _log_evaluation (395-421)
  • _extract_owasp_category_from_scenario (245-259)
  • _get_owasp_framework (261-270)
  • _get_attacks_for_category (272-281)
  • _select_and_enhance_attack (283-373)
rogue/evaluator_agent/base_evaluator_agent.py (8)
  • get_underlying_agent (197-253)
  • _get_conversation_context_id (560-566)
  • _send_message_to_evaluated_agent (540-557)
  • _log_evaluation (372-534)
  • _before_tool_callback (255-273)
  • _after_tool_callback (275-295)
  • _before_model_callback (297-310)
  • _after_model_callback (312-332)
rogue/server/red_teaming/frameworks/owasp/owasp.py (3)
  • OWASPTop10 (14-82)
  • get_categories (80-82)
  • get_name (76-78)
rogue/server/red_teaming/attacks/base_attack.py (2)
  • enhance (24-36)
  • get_name (53-55)
rogue/server/red_teaming/attacks/single_turn/base64.py (2)
  • enhance (20-22)
  • get_name (28-29)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (2)
  • enhance (18-36)
  • get_name (42-43)
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (2)
  • enhance (20-27)
  • get_name (59-60)
rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (2)
  • enhance (23-32)
  • get_name (38-39)
rogue/server/red_teaming/attacks/single_turn/roleplay.py (2)
  • enhance (33-42)
  • get_name (48-49)
rogue/server/red_teaming/attacks/single_turn/rot13.py (2)
  • enhance (18-25)
  • get_name (31-32)
🪛 markdownlint-cli2 (0.18.1)
TESTING_WITH_CURL.md

181-181: Bare URL used

(MD034, no-bare-urls)


182-182: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: rogue_sanity
  • GitHub Check: codestyle

Comment on lines 395 to 421
def _log_evaluation(
self,
scenario: dict[str, str],
context_id: str,
evaluation_passed: bool,
reason: str,
**kwargs,
) -> None:
"""
Log evaluation and track scenario for attack selection.

Overrides parent to track which scenario is being tested
for better attack selection.
"""
# Track scenario for this context
scenario_text = scenario.get("scenario", "")
if scenario_text:
self._context_to_scenario[context_id] = scenario_text

# Call parent method
return super()._log_evaluation(
scenario,
context_id,
evaluation_passed,
reason,
**kwargs,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align _log_evaluation scenario handling with base class to avoid crashes

Same issue as in the MCP variant: this override assumes scenario is a dict and calls .get, but base_evaluator_agent._log_evaluation is explicitly written to tolerate strings and other shapes from the LLM. If a string comes through here, you’ll get an AttributeError before the base class can normalize it.

You can fix this by normalizing scenario before using it:

     def _log_evaluation(
         self,
-        scenario: dict[str, str],
+        scenario: dict[str, str],  # Base implementation also accepts str at runtime
         context_id: str,
         evaluation_passed: bool,
         reason: str,
         **kwargs,
     ) -> None:
@@
-        # Track scenario for this context
-        scenario_text = scenario.get("scenario", "")
-        if scenario_text:
-            self._context_to_scenario[context_id] = scenario_text
+        # Track scenario for this context (handle both dict and str inputs safely)
+        scenario_text = ""
+        if isinstance(scenario, dict):
+            scenario_text = scenario.get("scenario", "")
+        elif isinstance(scenario, str):
+            scenario_text = scenario
+        if scenario_text:
+            self._context_to_scenario[context_id] = scenario_text

Then delegate to super()._log_evaluation(...) as you already do.

🤖 Prompt for AI Agents
In rogue/evaluator_agent/red_team_a2a_evaluator_agent.py around lines 395 to
421, the override of _log_evaluation assumes scenario is a dict and calls .get,
which will raise AttributeError if scenario is a string or other shape;
normalize scenario first (e.g., if isinstance(scenario, dict) use
scenario.get("scenario", "") else treat scenario as the scenario text or wrap it
into a dict with key "scenario") to extract scenario_text safely, set
self._context_to_scenario[context_id] only when scenario_text is non-empty, then
delegate to super()._log_evaluation(...) unchanged.

Comment on lines 44 to 54
if types is None:
enum_types = list(ExcessiveAgencyType)
else:
# Convert string values to enum types
enum_types = [
ExcessiveAgencyType(t) if isinstance(t, str) else t
for t in types
if isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType]
]

super().__init__(types=enum_types) # type: ignore[arg-type]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix filtering logic to handle both string and enum inputs correctly.

The list comprehension on lines 48-52 has a subtle bug: it converts strings to enums (or passes through non-strings) on line 49, but then filters on line 51 to only keep items where isinstance(t, str). This means if a caller passes ExcessiveAgencyType.PERMISSIONS directly (an enum, not a string), it will be filtered out incorrectly.

Apply this diff to handle both string and enum inputs:

-        else:
-            # Convert string values to enum types
-            enum_types = [
-                ExcessiveAgencyType(t) if isinstance(t, str) else t
-                for t in types
-                if isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType]
-            ]
+        else:
+            # Convert string values to enum types and filter valid enums
+            enum_types = []
+            for t in types:
+                if isinstance(t, ExcessiveAgencyType):
+                    enum_types.append(t)
+                elif isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType]:
+                    enum_types.append(ExcessiveAgencyType(t))

Alternatively, use a more concise approach with error handling:

-        else:
-            # Convert string values to enum types
-            enum_types = [
-                ExcessiveAgencyType(t) if isinstance(t, str) else t
-                for t in types
-                if isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType]
-            ]
+        else:
+            # Convert string values to enum types, keep valid enums, skip invalid
+            enum_types = []
+            for t in types:
+                if isinstance(t, ExcessiveAgencyType):
+                    enum_types.append(t)
+                elif isinstance(t, str):
+                    try:
+                        enum_types.append(ExcessiveAgencyType(t))
+                    except ValueError:
+                        pass  # Skip invalid string values

Note: This same pattern appears in prompt_leakage.py (lines 48-54) and likely other vulnerability files—apply the same fix consistently across all vulnerability classes.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if types is None:
enum_types = list(ExcessiveAgencyType)
else:
# Convert string values to enum types
enum_types = [
ExcessiveAgencyType(t) if isinstance(t, str) else t
for t in types
if isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType]
]
super().__init__(types=enum_types) # type: ignore[arg-type]
if types is None:
enum_types = list(ExcessiveAgencyType)
else:
# Convert string values to enum types, keep valid enums, skip invalid
enum_types = []
for t in types:
if isinstance(t, ExcessiveAgencyType):
enum_types.append(t)
elif isinstance(t, str):
try:
enum_types.append(ExcessiveAgencyType(t))
except ValueError:
pass # Skip invalid string values
super().__init__(types=enum_types) # type: ignore[arg-type]
🤖 Prompt for AI Agents
In rogue/server/red_teaming/vulnerabilities/excessive_agency.py around lines 44
to 54, the list comprehension currently converts strings to enums but then
filters to only keep strings, which incorrectly drops inputs already provided as
ExcessiveAgencyType enums; replace the comprehension with logic that keeps enum
instances as-is and converts valid string values to enums (or use a try/except
around ExcessiveAgencyType(t) to coerce strings and skip invalid values), then
pass the resulting enum_types to super().__init__; apply the same fix to the
analogous code in prompt_leakage.py (lines ~48-54) and any other vulnerability
files using this pattern.

- Over-relies on user input without validation
"""

name = "Robustness"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add type annotation to class attribute.

Per coding guidelines, all class attributes should have type annotations.

As per coding guidelines.

Apply this diff:

-    name = "Robustness"
+    name: str = "Robustness"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name = "Robustness"
name: str = "Robustness"
🤖 Prompt for AI Agents
In rogue/server/red_teaming/vulnerabilities/robustness.py around line 29, the
class attribute "name = \"Robustness\"" lacks a type annotation; add an explicit
type annotation (e.g., change to name: str = "Robustness") so the class
attribute is typed per coding guidelines.

@drorIvry drorIvry force-pushed the feature/support-bedrock branch from 7ea29b5 to 2d58185 Compare November 20, 2025 20:10
Copy link
Collaborator

@yuval-qf yuval-qf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments for your consideration

Comment on lines +140 to +147
// Bedrock-specific fields
AWSAccessKeyInput string
AWSAccessKeyCursor int
AWSSecretKeyInput string
AWSSecretKeyCursor int
AWSRegionInput string
AWSRegionCursor int
ActiveInputField int // 0=APIKey, 1=AWSAccessKey, 2=AWSSecretKey, 3=AWSRegion (for Bedrock)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add aws profile in case the user connects to aws using aws sso login?

@drorIvry drorIvry merged commit 1e24b6c into main Nov 27, 2025
8 of 9 checks passed
@drorIvry drorIvry deleted the feature/support-bedrock branch November 27, 2025 16:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/tui/internal/components/llm_config_dialog.go (1)

140-147: Bedrock credentials still omit AWS session token, breaking common STS/SSO flows.

Bedrock wiring adds access key, secret, and region fields to LLMConfigDialog and LLMConfigResultMsg, and forwards them in the Bedrock branches of handleEnter, but there’s still no AWS_SESSION_TOKEN support. For temporary credentials (the default with STS/SSO), calls will fail without the session token, so Bedrock remains unusable for a large class of setups. This was flagged in a previous review and is still outstanding.

At minimum, extend the dialog state and result message and thread the field through the Bedrock paths in this file; downstream storage/clients can then forward the full set:

 type LLMConfigDialog struct {
@@
-   AWSRegionInput     string
-   AWSRegionCursor    int
-   ActiveInputField   int // 0=APIKey, 1=AWSAccessKey, 2=AWSSecretKey, 3=AWSRegion (for Bedrock)
+   AWSRegionInput        string
+   AWSRegionCursor       int
+   AWSSessionTokenInput  string
+   AWSSessionTokenCursor int
+   ActiveInputField      int // 0=AWSAccessKey, 1=AWSSecretKey, 2=AWSRegion, 3=AWSSessionToken (for Bedrock)
 type LLMConfigResultMsg struct {
     Provider           string
     APIKey             string
     AWSAccessKeyID     string
     AWSSecretAccessKey string
     AWSRegion          string
+    AWSSessionToken    string
     Model              string
     Action             string
 }
-           if provider.Name == "bedrock" {
-               msg.APIKey = d.AWSAccessKeyInput
-               msg.AWSAccessKeyID = d.AWSAccessKeyInput
-               msg.AWSSecretAccessKey = d.AWSSecretKeyInput
-               msg.AWSRegion = d.AWSRegionInput
-           }
+           if provider.Name == "bedrock" {
+               msg.APIKey = d.AWSAccessKeyInput
+               msg.AWSAccessKeyID = d.AWSAccessKeyInput
+               msg.AWSSecretAccessKey = d.AWSSecretKeyInput
+               msg.AWSRegion = d.AWSRegionInput
+               msg.AWSSessionToken = d.AWSSessionTokenInput
+           }

and mirror the same addition in the ModelSelectionStep Bedrock block. The new session‑token field should also be integrated into the Bedrock input navigation/backspace/delete/paste/render logic alongside the existing three Bedrock fields.

Also applies to: 162-167, 692-697, 728-733

🧹 Nitpick comments (1)
packages/tui/internal/components/llm_config_dialog.go (1)

207-237: Bedrock provider definition and multi-field input UX look solid overall.

The Bedrock provider entry, three-field AWS input layout, keyboard navigation (↑/↓ between fields, ←/→ cursor movement, focus handoff to buttons), and per-field backspace/delete/paste handling are coherent and match the behavior of the single-field providers. The generic renderInputField and renderAPIKeyInput split keeps the rendering reasonably clean.

If you touch this again, consider factoring the repeated if provider.Name == "bedrock" { ... } else { ... } branches into small helpers (e.g., handleBedrockChar, handleBedrockDelete, renderBedrockInputs) to reduce duplication and make it easier to add the session-token field without editing multiple switch statements.

Also applies to: 350-379, 385-415, 433-445, 465-480, 488-515, 520-543, 548-589, 746-775, 1001-1069, 1071-1163

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ea29b5 and 356ab52.

📒 Files selected for processing (1)
  • packages/tui/internal/components/llm_config_dialog.go (17 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: rogue_sanity
  • GitHub Check: codestyle

Comment on lines +668 to 673
// Validate inputs based on provider
provider := d.Providers[d.SelectedProvider]
if provider.Name != "bedrock" || provider.Name != "lm_studio" && d.APIKeyInput == "" {
d.ErrorMessage = "API key cannot be empty"
return d, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix API key validation condition; it currently always trips and blocks configuration.

The condition:

if provider.Name != "bedrock" || provider.Name != "lm_studio" && d.APIKeyInput == "" {

is effectively always true for non‑Bedrock providers and still validates against APIKeyInput for Bedrock, so the “Validate” path can never succeed for any provider and Bedrock can’t use its dedicated fields.

Recommend rewriting to validate per provider explicitly:

-       // Validate inputs based on provider
-       provider := d.Providers[d.SelectedProvider]
-       if provider.Name != "bedrock" || provider.Name != "lm_studio" && d.APIKeyInput == "" {
-           d.ErrorMessage = "API key cannot be empty"
-           return d, nil
-       }
+       // Validate inputs based on provider
+       provider := d.Providers[d.SelectedProvider]
+       switch provider.Name {
+       case "bedrock":
+           if d.AWSAccessKeyInput == "" || d.AWSSecretKeyInput == "" || d.AWSRegionInput == "" {
+               d.ErrorMessage = "AWS access key, secret key, and region cannot be empty"
+               return d, nil
+           }
+       case "lm_studio":
+           // lm_studio does not require an API key
+       default:
+           if d.APIKeyInput == "" {
+               d.ErrorMessage = "API key cannot be empty"
+               return d, nil
+           }
+       }
🤖 Prompt for AI Agents
In packages/tui/internal/components/llm_config_dialog.go around lines 668 to
673, the combined boolean expression incorrectly uses || and && so it always
evaluates true and blocks validation; replace it with explicit per-provider
checks: if provider.Name == "bedrock" validate bedrock-specific fields (not
APIKeyInput), else if provider.Name == "lm_studio" validate lm_studio-specific
requirements, otherwise ensure d.APIKeyInput is not empty. Implement clear
separate branches for each provider (or use proper parentheses) so only the
relevant fields are validated for each provider.

@coderabbitai coderabbitai bot mentioned this pull request Jan 1, 2026
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants