-
Notifications
You must be signed in to change notification settings - Fork 144
feature/support bedrock #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis PR adds AWS Bedrock support for LLM integration and introduces red-team evaluation capabilities. Changes span TUI credential input dialogs, API controllers, server services, and Python evaluator agents, threading AWS credentials (access key, secret key, region) through interview and evaluation workflows alongside a new evaluation mode selector. Changes
Sequence DiagramsequenceDiagram
actor User
participant TUI as TUI Client
participant Controller as Common Controller
participant Service as Server<br/>(Evaluation Service)
participant Orchestrator as Evaluation<br/>Orchestrator
participant Agent as Evaluator Agent
participant LLM as LLM Service<br/>(litellm)
User->>TUI: Configure LLM (Bedrock credentials)
TUI->>TUI: Store AWS credentials in config<br/>(access key, secret key, region)
User->>Controller: Start Evaluation
Controller->>Controller: Extract credentials from config<br/>based on provider type<br/>(Bedrock vs. API key)
Controller->>Service: Call run_job with<br/>AgentConfig (credentials, mode)
Service->>Orchestrator: Initialize with evaluation_mode,<br/>AWS credentials, owasp_categories
alt evaluation_mode = RED_TEAM
Orchestrator->>Orchestrator: Generate red-team scenarios<br/>via RedTeamScenarioGenerator
rect rgb(200, 150, 255)
note over Orchestrator: New red-team path
end
else evaluation_mode = POLICY
Orchestrator->>Orchestrator: Use provided scenarios
end
Orchestrator->>Agent: Create agent with<br/>judge_llm AWS credentials,<br/>evaluation_mode, owasp_categories
Agent->>Agent: Evaluate policy/scenarios
alt provider = Bedrock
Agent->>LLM: completion call with<br/>aws_access_key_id,<br/>aws_secret_access_key
rect rgb(200, 150, 255)
note over LLM: Bedrock credentials<br/>(not API key)
end
else provider = other
Agent->>LLM: completion call with<br/>api_key parameter
end
LLM-->>Agent: LLM response
Agent-->>Orchestrator: Evaluation results
Orchestrator-->>Service: Final results<br/>(with red_teaming_results if applicable)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (5)
sdks/python/rogue_sdk/types.py (4)
120-122: Consider usingSecretStrfor AWS credentials to prevent accidental exposure.AWS credentials are sensitive secrets that should be protected from accidental logging or exposure. While the existing
judge_llm_api_keyfield (line 119) also usesOptional[str], best practice would be to use Pydantic'sSecretStrtype for all credential fields, which is already imported in this file.Apply this diff to enhance security:
- judge_llm_aws_access_key_id: Optional[str] = None - judge_llm_aws_secret_access_key: Optional[str] = None - judge_llm_aws_region: Optional[str] = None + judge_llm_aws_access_key_id: Optional[SecretStr] = None + judge_llm_aws_secret_access_key: Optional[SecretStr] = None + judge_llm_aws_region: Optional[str] = NoneNote: Region is not a secret, so it can remain as
Optional[str]. You would need to update all code that accesses these fields to use.get_secret_value()similar to howget_auth_header()handles credentials (lines 35-38).
378-380: Consider usingSecretStrfor AWS credentials in StartInterviewRequest.Same security recommendation as for
AgentConfig: AWS access keys and secret keys should be protected usingSecretStrto prevent accidental exposure through logs or error messages.
472-474: Consider usingSecretStrfor AWS credentials in ScenarioGenerationRequest.Same security recommendation: AWS credentials should use
SecretStrtype for protection.
491-493: Consider usingSecretStrfor AWS credentials in SummaryGenerationRequest.Same security recommendation: AWS credentials should use
SecretStrtype for protection.packages/tui/internal/tui/common_controller.go (1)
203-209: Consider extracting duplicate model-prefix logic to a helper function.The logic for checking if a model string contains "/" and conditionally prefixing it with the provider appears twice (lines 203-209 and lines 298-304). This duplication could lead to inconsistencies if the logic needs to be updated.
Consider creating a helper function:
// formatModelWithProvider returns the model string with provider prefix if not already present func formatModelWithProvider(model, provider string) string { if strings.Contains(model, "/") { return model } return provider + "/" + model }Then use it in both locations:
judgeModel = formatModelWithProvider(m.config.SelectedModel, m.config.SelectedProvider)and
m.evalState.JudgeModel = formatModelWithProvider(msg.Model, msg.Provider)Also applies to: 298-304
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (21)
VERSION(1 hunks)packages/tui/internal/components/llm_config_dialog.go(16 hunks)packages/tui/internal/tui/commands.go(2 hunks)packages/tui/internal/tui/common_controller.go(2 hunks)packages/tui/internal/tui/interview_controller.go(4 hunks)packages/tui/internal/tui/interview_utils.go(2 hunks)packages/tui/internal/tui/keyboard_controller.go(2 hunks)packages/tui/internal/tui/scenario_utils.go(1 hunks)packages/tui/internal/tui/utils.go(4 hunks)pyproject.toml(1 hunks)rogue/evaluator_agent/base_evaluator_agent.py(3 hunks)rogue/evaluator_agent/evaluator_agent_factory.py(2 hunks)rogue/evaluator_agent/policy_evaluation.py(2 hunks)rogue/evaluator_agent/run_evaluator_agent.py(2 hunks)rogue/server/api/interview.py(1 hunks)rogue/server/api/llm.py(3 hunks)rogue/server/core/evaluation_orchestrator.py(3 hunks)rogue/server/services/evaluation_service.py(1 hunks)rogue/server/services/interviewer_service.py(2 hunks)rogue/server/services/llm_service.py(5 hunks)sdks/python/rogue_sdk/types.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format Python code with Black
Ensure code passes flake8 linting
Run mypy with the repository configuration for static typing
Run Bandit security checks using .bandit.yaml configuration
Use isort import conventions for import ordering
Add type hints to all function signatures
Follow PEP 8 naming (snake_case for variables/functions, PascalCase for classes)
Use try/except around code that may raise exceptions
Files:
rogue/server/core/evaluation_orchestrator.pyrogue/server/services/evaluation_service.pyrogue/evaluator_agent/run_evaluator_agent.pyrogue/server/services/llm_service.pysdks/python/rogue_sdk/types.pyrogue/evaluator_agent/evaluator_agent_factory.pyrogue/evaluator_agent/policy_evaluation.pyrogue/server/services/interviewer_service.pyrogue/evaluator_agent/base_evaluator_agent.pyrogue/server/api/interview.pyrogue/server/api/llm.py
pyproject.toml
📄 CodeRabbit inference engine (AGENTS.md)
Manage dependencies with uv and declare them in pyproject.toml
Files:
pyproject.toml
🧬 Code graph analysis (6)
packages/tui/internal/tui/common_controller.go (1)
packages/tui/internal/tui/types.go (2)
NewEvaluationScreen(37-37)Model(51-90)
packages/tui/internal/tui/interview_utils.go (2)
packages/tui/internal/tui/utils.go (1)
RogueSDK(79-83)sdks/python/rogue_sdk/types.py (2)
StartInterviewResponse(383-388)StartInterviewRequest(373-380)
packages/tui/internal/tui/utils.go (1)
sdks/python/rogue_sdk/types.py (5)
Protocol(68-79)Transport(82-93)AuthType(23-48)EvaluationRequest(419-425)AgentConfig(104-139)
packages/tui/internal/components/llm_config_dialog.go (3)
packages/tui/internal/tui/types.go (1)
Model(51-90)packages/tui/internal/theme/theme.go (1)
Theme(10-78)packages/tui/internal/styles/styles.go (4)
NewStyle(45-47)Primary(13-13)TextMuted(22-22)Border(23-23)
packages/tui/internal/tui/interview_controller.go (2)
packages/tui/internal/screens/scenarios/scenario_types.go (1)
InterviewStartedMsg(40-44)packages/tui/internal/tui/scenario_utils.go (1)
ScenarioGenerationRequest(26-34)
rogue/server/api/llm.py (2)
sdks/python/rogue_sdk/types.py (5)
EvaluationResults(240-256)ReportSummaryRequest(558-567)ReportSummaryResponse(570-573)ScenarioGenerationRequest(466-475)SummaryGenerationRequest(485-498)rogue/server/models/api_format.py (1)
ServerSummaryGenerationResponse(52-56)
🔇 Additional comments (10)
pyproject.toml (1)
10-10: boto3 1.40.69 is secure and suitable for Bedrock support.The dependency is correctly placed and formatted. Verification confirms: (1) no known security vulnerabilities in boto3 1.40.69, and (2) AWS Bedrock API operations are supported in that version. The version constraint pattern (>=1.40.69) is consistent with other dependencies in the project.
rogue/server/core/evaluation_orchestrator.py (1)
34-36: LGTM: AWS credentials correctly threaded through orchestrator.The implementation correctly accepts, stores, and forwards AWS credentials to the evaluator agent. The pattern is consistent and maintains the optional nature of these parameters.
Also applies to: 45-47, 99-101
packages/tui/internal/tui/common_controller.go (1)
284-294: LGTM: Bedrock credentials stored separately with clear naming.The implementation correctly stores Bedrock AWS credentials in separate map keys (
bedrock_access_key,bedrock_secret_key,bedrock_region) rather than mixing them with provider API keys. This separation is clear and maintainable.packages/tui/internal/tui/commands.go (2)
52-76: LGTM: Clean provider-based credential handling.The implementation correctly distinguishes between Bedrock (which uses AWS credentials) and other providers (which use API keys). The comments are clear, and the credential extraction logic is consistent with how credentials are stored in
common_controller.go.
93-95: LGTM: AWS credentials correctly passed to GenerateSummary.The AWS credential pointers are properly passed to the SDK's
GenerateSummarymethod, maintaining the optional nature of these parameters.rogue/evaluator_agent/run_evaluator_agent.py (1)
86-88: LGTM: AWS credentials correctly threaded through to evaluator agent.The implementation correctly accepts AWS credentials as optional parameters and forwards them to
get_evaluator_agent. The pattern is consistent with the broader credential propagation strategy.Also applies to: 124-126
rogue/server/services/evaluation_service.py (1)
108-110: LGTM: AWS credentials correctly extracted from config and passed to orchestrator.The implementation correctly extracts AWS credentials from the
agent_configand forwards them to theEvaluationOrchestrator, completing the credential flow from the API request to the evaluation execution. Line 109 is appropriately marked withnoqa: E501for the long line.packages/tui/internal/tui/interview_controller.go (1)
32-150: Nice provider-specific credential handlingThe provider split plus Bedrock credential extraction keeps API keys and AWS creds separated cleanly. Passing them through StartInterview/GenerateScenarios lines everything up with the server changes. Looks solid.
rogue/evaluator_agent/base_evaluator_agent.py (1)
157-359: Good job threading judge AWS credentialsCapturing the judge’s AWS access key/secret/region at init and forwarding them into
evaluate_policycompletes the Bedrock path for evaluator runs. The change slots in cleanly with existing auth handling.packages/tui/internal/tui/interview_utils.go (1)
21-97: SDK request matches the server contractAdding the AWS fields to
StartInterviewRequestand threading them throughStartInterviewkeeps the TUI in sync with the backend/SDK types. Theomitemptytags avoid clutter when credentials aren’t needed. Nicely done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
rogue/evaluator_agent/run_evaluator_agent.py (1)
82-98: Expose new AWS and red‑team parameters throughrun_evaluator_agentas well
arun_evaluator_agentcorrectly accepts and forwardsjudge_llm_aws_*,evaluation_mode, andowasp_categoriesintoget_evaluator_agent, but the synchronous wrapperrun_evaluator_agentneither accepts nor passes these parameters. This means:
- Sync callers cannot use Bedrock judge LLM or RED_TEAM mode.
- If any call sites were updated to pass these kwargs to
run_evaluator_agent, they will raiseTypeError.Extending the sync wrapper keeps both paths aligned and preserves backward compatibility via defaults.
-def run_evaluator_agent( - protocol: Protocol, - transport: Transport | None, - evaluated_agent_url: str, - auth_type: AuthType, - auth_credentials: str | None, - judge_llm: str, - judge_llm_api_key: str | None, - scenarios: Scenarios, - business_context: str, - deep_test_mode: bool, -) -> EvaluationResults: +def run_evaluator_agent( + protocol: Protocol, + transport: Transport | None, + evaluated_agent_url: str, + auth_type: AuthType, + auth_credentials: str | None, + judge_llm: str, + judge_llm_api_key: str | None, + scenarios: Scenarios, + business_context: str, + deep_test_mode: bool, + judge_llm_aws_access_key_id: str | None = None, + judge_llm_aws_secret_access_key: str | None = None, + judge_llm_aws_region: str | None = None, + evaluation_mode: EvaluationMode = EvaluationMode.POLICY, + owasp_categories: Optional[List[str]] = None, +) -> EvaluationResults: async def run_evaluator_agent_task(): async for update_type, data in arun_evaluator_agent( protocol=protocol, transport=transport, evaluated_agent_url=evaluated_agent_url, auth_type=auth_type, auth_credentials=auth_credentials, judge_llm=judge_llm, judge_llm_api_key=judge_llm_api_key, scenarios=scenarios, business_context=business_context, deep_test_mode=deep_test_mode, + judge_llm_aws_access_key_id=judge_llm_aws_access_key_id, + judge_llm_aws_secret_access_key=judge_llm_aws_secret_access_key, + judge_llm_aws_region=judge_llm_aws_region, + evaluation_mode=evaluation_mode, + owasp_categories=owasp_categories, ): if update_type == "results": return dataAlso applies to: 249-261
rogue/server/services/evaluation_service.py (1)
169-177: Red‑team and OWASP results fromEvaluationResultsare currently discardedAfter
orchestrator.run_evaluation()completes, the service only persistsfinal_results.resultsontojob.results:if final_results and final_results.results: job.results = final_results.resultsBut
EvaluationResultsnow also carries:
red_teaming_resultsowasp_summaryThese are not stored on the job or sent via WebSocket updates, so callers using the job API cannot access red‑team outcomes or OWASP summaries, even when running in RED_TEAM mode.
To expose the new functionality end‑to‑end, you’ll likely need to:
- Extend
EvaluationJob(and any related DTOs / websocket payloads) to includered_teaming_resultsandowasp_summary, or- Store the full
EvaluationResultsobject on the job and adapt consumers to read from it.rogue/server/core/evaluation_orchestrator.py (1)
134-151: Preserve red‑teaming metadata when aggregatingEvaluationResultsRight now, the
"results"branch only loops overresults.resultsand callsself.results.add_result(res). Any data inred_teaming_results(and potentiallyowasp_summary) on theEvaluationResultsinstance returned byarun_evaluator_agentis dropped, which undermines the new red-team support.
EvaluationResultsalready has acombinemethod that merges both standard and red-teaming results. Using it here keeps the aggregation logic in one place.You can replace the manual loop with:
- if update_type == "results": - # Process results - results = data - if results and results.results: - self.logger.info( - f"📊 Processing {len(results.results)} evaluation results", - ) - for res in results.results: - self.results.add_result(res) - else: - self.logger.warning( - "⚠️ Received results update but no results data", - ) - - # Yield the accumulated results - yield "results", self.results + if update_type == "results": + # Process results (policy + red-teaming metadata) + results: EvaluationResults = data + if results.results: + self.logger.info( + f"📊 Processing {len(results.results)} evaluation results", + ) + else: + self.logger.warning( + "⚠️ Received results update but no results data", + ) + + self.results.combine(results) + + # Yield the accumulated results + yield "results", self.resultsThis way, both standard and red-team evaluation data remain intact through orchestration.
Also applies to: 162-177
🧹 Nitpick comments (26)
rogue/__main__.py (1)
17-18: Eliminate duplicate imports.The
sysandPathimports here are redundant since they're already imported at lines 3 and 6. The fallback block can use the existing imports.Apply this diff:
try: from . import __version__ except ImportError: # Fallback if running directly # Add parent directory to path - import sys # noqa: F811 - from pathlib import Path # noqa: F811 - sys.path.insert(0, str(Path(__file__).parent.parent)) from rogue import __version__ # noqa: F401TESTING_WITH_CURL.md (1)
181-182: Format URLs per markdown best practices.Wrap the bare URLs in angle brackets to comply with markdown conventions.
Apply this diff:
-- **Swagger UI**: http://localhost:8000/docs -- **ReDoc**: http://localhost:8000/redoc +- **Swagger UI**: <http://localhost:8000/docs> +- **ReDoc**: <http://localhost:8000/redoc>rogue/evaluator_agent/base_evaluator_agent.py (1)
472-494: Consider extracting OWASP normalization logic.The scenario_type normalization logic for OWASP categories is complex and could benefit from extraction to a dedicated helper method for improved readability and testability.
Consider extracting to a method like:
def _normalize_scenario_type_and_outcome( self, scenario_dict: dict[str, str] ) -> tuple[str, str]: """ Normalize scenario_type and expected_outcome for OWASP categories. Returns: Tuple of (scenario_type, expected_outcome) """ scenario_type = scenario_dict.get("scenario_type", "policy") if scenario_type not in [st.value for st in ScenarioType]: # OWASP category ID handling scenario_type = ScenarioType.POLICY.value expected_outcome = scenario_dict.get("expected_outcome", "") owasp_cat = scenario_dict.get("scenario_type") if owasp_cat and isinstance(owasp_cat, str) and owasp_cat not in expected_outcome: if expected_outcome: expected_outcome = f"{expected_outcome} (OWASP: {owasp_cat})" else: expected_outcome = f"OWASP Category: {owasp_cat}" else: expected_outcome = scenario_dict.get("expected_outcome") or "" return scenario_type, expected_outcometest_red_team_curl.sh (1)
28-33: Improve error handling for job_id extraction.The current implementation uses Python one-liner that can fail with
json.JSONDecodeErrororKeyErrorif the response is malformed or missing thejob_idfield. While line 30 checks for emptyJOB_ID, the error message doesn't indicate whether the issue was invalid JSON or a missing key.Apply this diff for more robust error handling:
-JOB_ID=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['job_id'])") +JOB_ID=$(echo "$RESPONSE" | python3 -c " +import sys, json +try: + data = json.load(sys.stdin) + print(data.get('job_id', '')) +except (json.JSONDecodeError, KeyError) as e: + print('', file=sys.stderr) + sys.exit(0) +" 2>/dev/null) if [ -z "$JOB_ID" ]; then echo "❌ Failed to create job" + echo "Response was:" echo "$RESPONSE" | python3 -m json.tool 2>/dev/null || echo "$RESPONSE" exit 1 firogue/server/services/evaluation_library.py (2)
75-89: Consider using direct attribute access instead of getattr.According to the AgentConfig schema in
rogue_sdk/types.py, the fieldsevaluation_mode,owasp_categories, andattacks_per_categoryare already defined with default values. Usinggetattrwith defaults suggests uncertainty about the schema and could mask typos or version mismatches.Apply this diff to use direct attribute access:
- evaluation_mode=getattr( - agent_config, - "evaluation_mode", - EvaluationMode.POLICY, - ), - owasp_categories=getattr( - agent_config, - "owasp_categories", - None, - ), - attacks_per_category=getattr( - agent_config, - "attacks_per_category", - 5, - ), + evaluation_mode=agent_config.evaluation_mode, + owasp_categories=agent_config.owasp_categories, + attacks_per_category=agent_config.attacks_per_category,If backward compatibility with older AgentConfig versions is required, document this and add a comment explaining the use of
getattr.
62-90: Extract ScenarioEvaluationService construction to reduce duplication.The ScenarioEvaluationService initialization logic is duplicated between
evaluate_agentandevaluate_agent_streamingmethods. This violates the DRY principle and increases maintenance burden.Add a helper method to construct the service:
@staticmethod def _create_evaluation_service( agent_config: AgentConfig, scenarios: Scenarios, business_context: str, ) -> ScenarioEvaluationService: """Create a ScenarioEvaluationService from agent config.""" return ScenarioEvaluationService( protocol=agent_config.protocol, transport=agent_config.transport, evaluated_agent_url=str(agent_config.evaluated_agent_url), evaluated_agent_auth_type=agent_config.evaluated_agent_auth_type, evaluated_agent_auth_credentials=agent_config.evaluated_agent_credentials, judge_llm=agent_config.judge_llm, judge_llm_api_key=agent_config.judge_llm_api_key, scenarios=scenarios, business_context=business_context, deep_test_mode=agent_config.deep_test_mode, evaluation_mode=agent_config.evaluation_mode, owasp_categories=agent_config.owasp_categories, attacks_per_category=agent_config.attacks_per_category, )Then update both methods to use it:
service = EvaluationLibrary._create_evaluation_service( agent_config=agent_config, scenarios=scenarios, business_context=business_context, )Also applies to: 170-196
rogue/server/red_teaming/vulnerabilities/robustness.py (1)
42-50: Simplify and clarify type conversion logic.The current type conversion logic has redundant checks and silently drops invalid input. Line 47 handles both string and non-string cases, but line 49 filters to keep only valid strings, meaning non-strings are silently ignored. This makes the logic confusing and could hide errors.
Apply this diff to simplify and make the behavior explicit:
if types is None: enum_types = list(RobustnessType) else: # Convert string values to enum types enum_types = [ - RobustnessType(t) if isinstance(t, str) else t + RobustnessType(t) for t in types - if isinstance(t, str) and t in [e.value for e in RobustnessType] + if t in [e.value for e in RobustnessType] ] super().__init__(types=enum_types) # type: ignore[arg-type]Alternatively, add explicit validation to raise an error for invalid types:
if types is None: enum_types = list(RobustnessType) else: enum_types = [] valid_values = [e.value for e in RobustnessType] for t in types: if isinstance(t, str): if t in valid_values: enum_types.append(RobustnessType(t)) else: raise ValueError(f"Invalid RobustnessType: {t}") elif isinstance(t, RobustnessType): enum_types.append(t) else: raise TypeError(f"Expected str or RobustnessType, got {type(t)}")test_server_red_team.sh (2)
45-51: Use jq or python for robust JSON parsing.The script uses
grepandcutto extract JSON fields, which is fragile and can break if the JSON formatting changes (e.g., whitespace, field order). This is similar to the issue intest_red_team_curl.sh.Apply this diff to use
jqfor robust JSON parsing:-JOB_ID=$(echo "$RESPONSE" | grep -o '"job_id":"[^"]*' | cut -d'"' -f4) +JOB_ID=$(echo "$RESPONSE" | jq -r '.job_id // empty') if [ -z "$JOB_ID" ]; then echo "❌ Failed to create evaluation job" - echo "Response: $RESPONSE" + echo "Response:" + echo "$RESPONSE" | jq . 2>/dev/null || echo "$RESPONSE" exit 1 fiOr if
jqis not available, use Python:-JOB_ID=$(echo "$RESPONSE" | grep -o '"job_id":"[^"]*' | cut -d'"' -f4) +JOB_ID=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin).get('job_id', ''))" 2>/dev/null)
61-61: Use jq or python for status extraction.Same fragile
greppattern is used to extract the status field.Apply this diff:
- STATUS=$(curl -s "$SERVER_URL/api/v1/evaluations/$JOB_ID" | grep -o '"status":"[^"]*' | cut -d'"' -f4) + STATUS=$(curl -s "$SERVER_URL/api/v1/evaluations/$JOB_ID" | jq -r '.status // "unknown"')Or with Python:
- STATUS=$(curl -s "$SERVER_URL/api/v1/evaluations/$JOB_ID" | grep -o '"status":"[^"]*' | cut -d'"' -f4) + STATUS=$(curl -s "$SERVER_URL/api/v1/evaluations/$JOB_ID" | python3 -c "import sys, json; print(json.load(sys.stdin).get('status', 'unknown'))" 2>/dev/null)rogue/server/red_teaming/metrics/base_red_teaming_metric.py (1)
32-39: Remove unnecessary return statement.The
measuremethod signature indicates it returnsNone, so thereturnstatement ina_measureis unnecessary and could be confusing.Apply this diff:
async def a_measure(self, test_case: Any) -> None: """ Async version of measure. Args: test_case: Test case containing attack input and agent response """ - return self.measure(test_case) + self.measure(test_case)rogue/server/red_teaming/attacks/single_turn/leetspeak.py (1)
18-36: Consider moving leet_map to a class-level constant.The
leet_mapdictionary is recreated on every call toenhance(), but it's a constant mapping. Moving it to a class-level constant improves efficiency and makes the code more maintainable.Apply this diff to refactor:
class Leetspeak(BaseSingleTurnAttack): """Leetspeak transformation attack.""" name = "Leetspeak" + + _LEET_MAP = { + "a": "4", + "e": "3", + "i": "1", + "o": "0", + "s": "5", + "t": "7", + "l": "1", + "A": "4", + "E": "3", + "I": "1", + "O": "0", + "S": "5", + "T": "7", + "L": "1", + } def __init__(self, weight: int = 1): self.weight = weight def enhance(self, attack: str) -> str: """Enhance the attack using Leetspeak transformation.""" - leet_map = { - "a": "4", - "e": "3", - "i": "1", - "o": "0", - "s": "5", - "t": "7", - "l": "1", - "A": "4", - "E": "3", - "I": "1", - "O": "0", - "S": "5", - "T": "7", - "L": "1", - } - return "".join(leet_map.get(char, char) for char in attack) + return "".join(self._LEET_MAP.get(char, char) for char in attack)rogue/server/red_teaming/attacks/single_turn/rot13.py (1)
18-29: Consider moving the translation table to a class-level constant.The ROT13 translation table is recreated on every call to
enhance(), but it's a constant mapping. Moving it to a class-level constant improves efficiency.Apply this diff to refactor:
class ROT13(BaseSingleTurnAttack): """ROT13 encoding attack.""" name = "ROT-13" + + _ROT13_TRANS = str.maketrans( + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz", + "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm", + ) def __init__(self, weight: int = 1): self.weight = weight def enhance(self, attack: str) -> str: """Enhance the attack using ROT13 encoding.""" - return attack.translate( - str.maketrans( - "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz", - "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm", - ), - ) + return attack.translate(self._ROT13_TRANS)rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (1)
20-39: Add return type annotation on__init__to match typing guidelines
__init__currently lacks a return type, while other methods are annotated. To keep mypy/typing consistent with the rest of the codebase and the stated guidelines, annotate it explicitly as returningNone.- def __init__(self, weight: int = 1): + def __init__(self, weight: int = 1) -> None: self.weight = weightrogue/server/red_teaming/attacks/single_turn/prompt_injection.py (1)
17-27: Annotate__init__return type and optionally avoid precomputing both variantsTwo small points here:
__init__should be annotated to returnNoneto keep function signatures fully typed:- def __init__(self, weight: int = 1): + def __init__(self, weight: int = 1) -> None: self.weight = weight
- To avoid constructing both enhanced prompts on every call (even though it's cheap here), you can randomize at the function level instead of over precomputed strings:
- def enhance(self, attack: str) -> str: - """Enhance the attack with prompt injection techniques.""" - return random.choice( # nosec B311 - [ - self._enhance_1(attack), - self._enhance_2(attack), - ], - ) + def enhance(self, attack: str) -> str: + """Enhance the attack with prompt injection techniques.""" + enhancer = random.choice( # nosec B311 + [self._enhance_1, self._enhance_2], + ) + return enhancer(attack)rogue/server/red_teaming/frameworks/owasp/owasp.py (1)
7-8: SimplifyOWASPTop10typing to avoid mypy ignores and add explicit__init__return typeThe current
categoriessignature (List[Literal[...]] = None # type: ignore[assignment]) forces type-ignores here and at call sites (e.g.,RedTeamScenarioGenerator) without much practical gain, since you already filter againstOWASP_CATEGORIES.You can make this cleaner and align with the “type hints on all function signatures” guideline by:
- Using
Optional[List[str]]forcategories(you still do runtime validation/filtering).- Adding
-> Noneto__init__.- Dropping the now-unneeded
Literalimport and# type: ignore[assignment].For example:
-from dataclasses import dataclass -from typing import List, Literal +from dataclasses import dataclass +from typing import List, Optional @@ - def __init__( - self, - categories: List[ - Literal[ - "LLM_01", - "LLM_02", - "LLM_03", - "LLM_04", - "LLM_05", - "LLM_06", - "LLM_07", - "LLM_08", - "LLM_09", - "LLM_10", - ] - ] = None, # type: ignore[assignment] - ): + def __init__( + self, + categories: Optional[List[str]] = None, + ) -> None: @@ - self.categories = categories + self.categories = categoriesThis also allows you to remove the
# type: ignore[arg-type]at the call site inRedTeamScenarioGenerator.Also applies to: 37-53, 62-74
rogue/server/red_teaming/frameworks/owasp/risk_categories.py (1)
22-37: OWASP category definitions align with attack/vulnerability typesThe
OWASPCategorydataclass and the three initial entries inOWASP_CATEGORIESlook consistent:
- Attack lists use the expected single‑turn attack classes with sensible weights/personas.
- Vulnerability
typesstrings match the Enum.values defined inExcessiveAgencyType,PromptLeakageType, andRobustnessType, so the filtering logic in those ctors will include them.One thing to keep in mind: the vulnerability ctors currently only include string values (Enum instances are effectively ignored). That’s fine given this module passes strings, but if you later start passing Enums directly, you’ll need to tweak those comprehensions to handle non‑string items.
Also applies to: 48-124
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
12-54: AnnotatetypesonBaseVulnerabilityfor clearer typingThe base class API is straightforward, but
self.typesis only introduced in__init__without a class-level annotation. Adding one tightens type checking and makes the attribute contract explicit.You can do:
class BaseVulnerability(ABC): @@ - name: str + name: str + types: List[Enum]No behavior change, just clearer intent for mypy and readers.
sdks/python/rogue_sdk/types.py (2)
127-147: Handle AWS secret access keys cautiously in SDK modelsThe new AWS fields (
judge_llm_aws_secret_access_key, and theaws_secret_access_keyfields on request models) are currently plainOptional[str], similar tojudge_llm_api_key.That’s workable, but to reduce the risk of accidental logging or serialization of secrets, you might consider:
- Using
SecretStrfor the secret access key fields inAgentConfigand request models, and- Ensuring any logging of these models uses filtered dumps (or omits these fields).
If you keep them as
str, it’d be good to double‑check that no code path logsmodel_dump()of these models without redaction.Also applies to: 432-434, 526-528, 545-547
262-293: Ensure new red‑teaming fields flow through downstream APIs
RedTeamingResultand the newred_teaming_results/owasp_summaryfields onEvaluationResultsgive you a richer result surface, but:
EvaluationResults.combineonly mergesresultsandred_teaming_results—owasp_summaryis left untouched. If you plan to aggregate summaries across batches, you’ll need explicit merge semantics there (or document thatowasp_summaryis per‑run and should not be combined).convert_to_api_formatcurrently only looks atevaluation_results.resultsand ignoresred_teaming_resultsandowasp_summary, so clients using the new API format won’t see any red‑team/OWASP information yet.If the intent is to expose OWASP/red‑team data via the new API, you’ll likely want to extend
ApiScenarioResult/ApiEvaluationResultandconvert_to_api_formataccordingly.Also applies to: 348-405
rogue/server/services/red_team_scenario_generator.py (1)
26-29: Tighten typing for_owasp_frameworkand drop now‑unnecessary type ignoreTo align with the “type hints on all function signatures” guideline and the simplified
OWASPTop10constructor:
- Add an explicit return type for
__init__.- Type
_owasp_frameworkasOWASPTop10 | None.- Remove the
# type: ignore[arg-type]now thatOWASPTop10.categoriesacceptsOptional[List[str]].For example:
- def __init__(self): - """Initialize the red team scenario generator.""" - self._owasp_framework = None + def __init__(self) -> None: + """Initialize the red team scenario generator.""" + self._owasp_framework: OWASPTop10 | None = None @@ - # Load OWASP framework with selected categories - self._owasp_framework = OWASPTop10( - categories=owasp_categories, # type: ignore[arg-type] - ) + # Load OWASP framework with selected categories + self._owasp_framework = OWASPTop10( + categories=owasp_categories, + )This keeps mypy/flake8 happy without suppressions and documents the internal state more clearly.
Also applies to: 55-58
rogue/evaluator_agent/red_team_mcp_evaluator_agent.py (2)
161-170: OWASP framework never loads whenowasp_categoriesis emptyRight now
_get_owasp_frameworkonly initializesOWASPTop10ifself._owasp_categoriesis truthy, and_select_and_enhance_attackalso relies onself._owasp_categoriesfor both extraction and random fallback. That means if callers omitowasp_categories(or pass an empty list), you never instantiate the OWASP framework and message enhancement is effectively disabled, despite the instructions saying “All available OWASP categories” and OWASPTop10 having its own default categories.If the intended behavior is “use OWASP defaults when categories aren’t specified”, consider letting
_get_owasp_frameworkcallOWASPTop10withcategories=Noneand then derivingself._owasp_categoriesfrom the framework once, e.g. based oncategory.id.Also applies to: 261-270
261-272: Add a return type hint to_get_owasp_frameworkfor mypy clarity
_get_owasp_frameworkcurrently has no return type annotation. Given the rest of this module is typed, adding something like-> object | Noneor a forward type (e.g."OWASPTop10" | NoneunderTYPE_CHECKING) will keep mypy/flake8 happier and document expectations.rogue/server/services/scenario_evaluation_service.py (1)
83-96: UseEvaluationResults.combineto preserve red-teaming metadataWhen processing the
"results"update, you currently only mergeresults.resultsviaself._results.add_result(res)and ignorered_teaming_resultsandowasp_summarythat are now part ofEvaluationResults. This means red-team–specific data is dropped at this layer.Given
EvaluationResultsalready exposescombine, you can simplify the aggregation and keep all fields:- if update_type == "results": - results = data - if results and results.results: - logger.info( - f"📊 Processing {len(results.results)} evaluation results", - ) - for res in results.results: - self._results.add_result(res) - else: - logger.warning("⚠️ Received results update but no results data") + if update_type == "results": + results = data + if results: + logger.info( + f"📊 Processing {len(results.results) if results.results else 0} evaluation results", + ) + self._results.combine(results) + else: + logger.warning("⚠️ Received results update but no results data")This keeps prior behavior for standard results while also propagating red-team fields.
Also applies to: 127-137, 181-187
rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (3)
161-170: Same OWASP default behavior consideration as MCP variantAs with the MCP agent,
_get_owasp_frameworkbails out whenself._owasp_categoriesis empty, so you never instantiate the OWASP framework or enhance messages if callers don’t explicitly pass categories. If the intent is to fall back to OWASP’s default categories when none are provided, consider letting_get_owasp_frameworkcallOWASPTop10withcategories=Noneand seedingself._owasp_categoriesfrom the resulting categories.Also applies to: 261-270
261-272: Consider adding a return type hint to_get_owasp_frameworkFor consistency with the rest of the module’s typed methods, adding a return annotation (e.g.
"OWASPTop10" | None) will help mypy and future readers.
113-168: Shared red-team logic between A2A and MCP could be factored
RedTeamA2AEvaluatorAgentandRedTeamMCPEvaluatorAgentshare nearly identical OWASP helpers and attack-selection logic. To reduce drift and keep future changes (e.g., adding new OWASP categories or attack selection tweaks) in one place, consider extracting the common pieces into a small mixin or helper class that both agents can reuse.Also applies to: 283-373
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (40)
TESTING_WITH_CURL.md(1 hunks)rogue/__main__.py(1 hunks)rogue/evaluator_agent/__init__.py(2 hunks)rogue/evaluator_agent/base_evaluator_agent.py(4 hunks)rogue/evaluator_agent/evaluator_agent_factory.py(2 hunks)rogue/evaluator_agent/red_team_a2a_evaluator_agent.py(1 hunks)rogue/evaluator_agent/red_team_mcp_evaluator_agent.py(1 hunks)rogue/evaluator_agent/run_evaluator_agent.py(3 hunks)rogue/server/core/evaluation_orchestrator.py(5 hunks)rogue/server/red_teaming/__init__.py(1 hunks)rogue/server/red_teaming/attacks/__init__.py(1 hunks)rogue/server/red_teaming/attacks/base_attack.py(1 hunks)rogue/server/red_teaming/attacks/single_turn/__init__.py(1 hunks)rogue/server/red_teaming/attacks/single_turn/base64.py(1 hunks)rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py(1 hunks)rogue/server/red_teaming/attacks/single_turn/leetspeak.py(1 hunks)rogue/server/red_teaming/attacks/single_turn/prompt_injection.py(1 hunks)rogue/server/red_teaming/attacks/single_turn/prompt_probing.py(1 hunks)rogue/server/red_teaming/attacks/single_turn/roleplay.py(1 hunks)rogue/server/red_teaming/attacks/single_turn/rot13.py(1 hunks)rogue/server/red_teaming/frameworks/__init__.py(1 hunks)rogue/server/red_teaming/frameworks/owasp/__init__.py(1 hunks)rogue/server/red_teaming/frameworks/owasp/owasp.py(1 hunks)rogue/server/red_teaming/frameworks/owasp/risk_categories.py(1 hunks)rogue/server/red_teaming/metrics/__init__.py(1 hunks)rogue/server/red_teaming/metrics/base_red_teaming_metric.py(1 hunks)rogue/server/red_teaming/vulnerabilities/__init__.py(1 hunks)rogue/server/red_teaming/vulnerabilities/base_vulnerability.py(1 hunks)rogue/server/red_teaming/vulnerabilities/excessive_agency.py(1 hunks)rogue/server/red_teaming/vulnerabilities/prompt_leakage.py(1 hunks)rogue/server/red_teaming/vulnerabilities/robustness.py(1 hunks)rogue/server/services/evaluation_library.py(3 hunks)rogue/server/services/evaluation_service.py(1 hunks)rogue/server/services/red_team_scenario_generator.py(1 hunks)rogue/server/services/scenario_evaluation_service.py(5 hunks)sdks/python/rogue_sdk/types.py(7 hunks)test_red_team_curl.sh(1 hunks)test_red_teaming_foundation.py(1 hunks)test_red_teaming_simple.py(1 hunks)test_server_red_team.sh(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format Python code with Black
Ensure code passes flake8 linting
Run mypy with the repository configuration for static typing
Run Bandit security checks using .bandit.yaml configuration
Use isort import conventions for import ordering
Add type hints to all function signatures
Follow PEP 8 naming (snake_case for variables/functions, PascalCase for classes)
Use try/except around code that may raise exceptions
Files:
rogue/server/red_teaming/__init__.pyrogue/server/red_teaming/attacks/single_turn/base64.pyrogue/__main__.pytest_red_teaming_simple.pyrogue/server/red_teaming/metrics/__init__.pyrogue/server/services/evaluation_library.pyrogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.pyrogue/server/red_teaming/metrics/base_red_teaming_metric.pyrogue/server/red_teaming/vulnerabilities/base_vulnerability.pyrogue/server/red_teaming/attacks/single_turn/leetspeak.pyrogue/server/red_teaming/attacks/base_attack.pytest_red_teaming_foundation.pyrogue/server/red_teaming/attacks/single_turn/rot13.pyrogue/server/red_teaming/frameworks/owasp/__init__.pyrogue/server/red_teaming/attacks/__init__.pyrogue/server/red_teaming/attacks/single_turn/__init__.pyrogue/evaluator_agent/run_evaluator_agent.pyrogue/server/services/red_team_scenario_generator.pyrogue/server/red_teaming/vulnerabilities/excessive_agency.pyrogue/server/red_teaming/attacks/single_turn/prompt_injection.pyrogue/server/red_teaming/frameworks/owasp/owasp.pyrogue/server/red_teaming/frameworks/owasp/risk_categories.pysdks/python/rogue_sdk/types.pyrogue/evaluator_agent/base_evaluator_agent.pyrogue/server/red_teaming/frameworks/__init__.pyrogue/server/red_teaming/attacks/single_turn/prompt_probing.pyrogue/server/red_teaming/vulnerabilities/prompt_leakage.pyrogue/evaluator_agent/__init__.pyrogue/server/services/evaluation_service.pyrogue/evaluator_agent/evaluator_agent_factory.pyrogue/server/red_teaming/vulnerabilities/__init__.pyrogue/server/red_teaming/vulnerabilities/robustness.pyrogue/server/red_teaming/attacks/single_turn/roleplay.pyrogue/server/services/scenario_evaluation_service.pyrogue/server/core/evaluation_orchestrator.pyrogue/evaluator_agent/red_team_a2a_evaluator_agent.pyrogue/evaluator_agent/red_team_mcp_evaluator_agent.py
🧬 Code graph analysis (31)
rogue/server/red_teaming/attacks/single_turn/base64.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
BaseSingleTurnAttack(10-13)rogue/server/red_teaming/attacks/base_attack.py (3)
enhance(24-36)a_enhance(38-50)get_name(53-55)
test_red_teaming_simple.py (10)
rogue/server/red_teaming/attacks/single_turn/rot13.py (3)
ROT13(10-32)enhance(18-25)get_name(31-32)rogue/server/red_teaming/attacks/single_turn/base64.py (3)
Base64(12-29)enhance(20-22)get_name(28-29)rogue/server/red_teaming/attacks/single_turn/leetspeak.py (3)
Leetspeak(10-43)enhance(18-36)get_name(42-43)rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (3)
PromptInjection(12-60)enhance(20-27)get_name(59-60)rogue/server/red_teaming/vulnerabilities/excessive_agency.py (1)
ExcessiveAgency(21-54)rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (1)
PromptLeakage(22-56)rogue/server/red_teaming/vulnerabilities/robustness.py (1)
Robustness(20-52)rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (2)
get_types(30-37)get_name(48-50)rogue/server/red_teaming/frameworks/owasp/owasp.py (3)
OWASPTop10(14-82)get_name(76-78)get_categories(80-82)sdks/python/rogue_sdk/types.py (3)
EvaluationMode(58-62)AgentConfig(111-161)RedTeamingResult(262-279)
rogue/server/red_teaming/metrics/__init__.py (1)
rogue/server/red_teaming/metrics/base_red_teaming_metric.py (1)
BaseRedTeamingMetric(11-39)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
rogue/server/red_teaming/attacks/base_attack.py (1)
BaseAttack(11-55)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (3)
rogue/server/red_teaming/attacks/base_attack.py (1)
get_name(53-55)rogue/server/red_teaming/attacks/single_turn/base64.py (1)
get_name(28-29)rogue/server/red_teaming/frameworks/owasp/owasp.py (1)
get_name(76-78)
rogue/server/red_teaming/attacks/single_turn/leetspeak.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
BaseSingleTurnAttack(10-13)rogue/server/red_teaming/attacks/base_attack.py (3)
enhance(24-36)a_enhance(38-50)get_name(53-55)
rogue/server/red_teaming/attacks/base_attack.py (8)
rogue/server/red_teaming/attacks/single_turn/base64.py (3)
enhance(20-22)a_enhance(24-26)get_name(28-29)rogue/server/red_teaming/attacks/single_turn/leetspeak.py (3)
enhance(18-36)a_enhance(38-40)get_name(42-43)rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (3)
enhance(20-27)a_enhance(29-31)get_name(59-60)rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (3)
enhance(23-32)a_enhance(34-36)get_name(38-39)rogue/server/red_teaming/attacks/single_turn/roleplay.py (3)
enhance(33-42)a_enhance(44-46)get_name(48-49)rogue/server/red_teaming/attacks/single_turn/rot13.py (3)
enhance(18-25)a_enhance(27-29)get_name(31-32)rogue/server/red_teaming/frameworks/owasp/owasp.py (1)
get_name(76-78)rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
get_name(48-50)
test_red_teaming_foundation.py (11)
rogue/server/red_teaming/attacks/single_turn/base64.py (3)
Base64(12-29)enhance(20-22)get_name(28-29)rogue/server/red_teaming/attacks/single_turn/leetspeak.py (3)
Leetspeak(10-43)enhance(18-36)get_name(42-43)rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (3)
PromptInjection(12-60)enhance(20-27)get_name(59-60)rogue/server/red_teaming/attacks/single_turn/rot13.py (3)
ROT13(10-32)enhance(18-25)get_name(31-32)rogue/server/red_teaming/attacks/base_attack.py (2)
enhance(24-36)get_name(53-55)rogue/server/red_teaming/vulnerabilities/excessive_agency.py (1)
ExcessiveAgency(21-54)rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (1)
PromptLeakage(22-56)rogue/server/red_teaming/vulnerabilities/robustness.py (1)
Robustness(20-52)rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (2)
get_types(30-37)get_name(48-50)rogue/server/red_teaming/frameworks/owasp/owasp.py (3)
OWASPTop10(14-82)get_name(76-78)get_categories(80-82)sdks/python/rogue_sdk/types.py (4)
EvaluationMode(58-62)AgentConfig(111-161)RedTeamingResult(262-279)EvaluationResults(282-310)
rogue/server/red_teaming/attacks/single_turn/rot13.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
BaseSingleTurnAttack(10-13)rogue/server/red_teaming/attacks/base_attack.py (3)
enhance(24-36)a_enhance(38-50)get_name(53-55)
rogue/server/red_teaming/frameworks/owasp/__init__.py (2)
rogue/server/red_teaming/frameworks/owasp/owasp.py (1)
OWASPTop10(14-82)rogue/server/red_teaming/frameworks/owasp/risk_categories.py (1)
OWASPCategory(23-37)
rogue/server/red_teaming/attacks/__init__.py (1)
rogue/server/red_teaming/attacks/base_attack.py (1)
BaseAttack(11-55)
rogue/server/red_teaming/attacks/single_turn/__init__.py (7)
rogue/server/red_teaming/attacks/single_turn/base64.py (1)
Base64(12-29)rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
BaseSingleTurnAttack(10-13)rogue/server/red_teaming/attacks/single_turn/leetspeak.py (1)
Leetspeak(10-43)rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (1)
PromptInjection(12-60)rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (1)
PromptProbing(10-39)rogue/server/red_teaming/attacks/single_turn/roleplay.py (1)
Roleplay(12-49)rogue/server/red_teaming/attacks/single_turn/rot13.py (1)
ROT13(10-32)
rogue/evaluator_agent/run_evaluator_agent.py (1)
sdks/python/rogue_sdk/types.py (3)
EvaluationMode(58-62)EvaluationResults(282-310)Scenarios(206-224)
rogue/server/services/red_team_scenario_generator.py (3)
sdks/python/rogue_sdk/types.py (3)
Scenario(164-203)Scenarios(206-224)ScenarioType(51-55)rogue/server/red_teaming/frameworks/owasp/owasp.py (2)
OWASPTop10(14-82)get_categories(80-82)rogue/server/red_teaming/frameworks/owasp/risk_categories.py (1)
OWASPCategory(23-37)
rogue/server/red_teaming/vulnerabilities/excessive_agency.py (1)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
BaseVulnerability(12-54)
rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
BaseSingleTurnAttack(10-13)rogue/server/red_teaming/attacks/base_attack.py (3)
enhance(24-36)a_enhance(38-50)get_name(53-55)
rogue/server/red_teaming/frameworks/owasp/owasp.py (1)
rogue/server/red_teaming/frameworks/owasp/risk_categories.py (1)
OWASPCategory(23-37)
rogue/server/red_teaming/frameworks/owasp/risk_categories.py (11)
rogue/server/red_teaming/attacks/base_attack.py (1)
BaseAttack(11-55)rogue/server/red_teaming/attacks/single_turn/rot13.py (1)
ROT13(10-32)rogue/server/red_teaming/attacks/single_turn/base64.py (1)
Base64(12-29)rogue/server/red_teaming/attacks/single_turn/leetspeak.py (1)
Leetspeak(10-43)rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (1)
PromptInjection(12-60)rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (1)
PromptProbing(10-39)rogue/server/red_teaming/attacks/single_turn/roleplay.py (1)
Roleplay(12-49)rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
BaseVulnerability(12-54)rogue/server/red_teaming/vulnerabilities/excessive_agency.py (1)
ExcessiveAgency(21-54)rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (1)
PromptLeakage(22-56)rogue/server/red_teaming/vulnerabilities/robustness.py (1)
Robustness(20-52)
sdks/python/rogue_sdk/types.py (1)
packages/sdk/src/types.ts (2)
ChatMessage(49-53)EvaluationResult(61-65)
rogue/evaluator_agent/base_evaluator_agent.py (1)
sdks/python/rogue_sdk/types.py (2)
ScenarioType(51-55)Scenario(164-203)
rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
BaseSingleTurnAttack(10-13)rogue/server/red_teaming/attacks/base_attack.py (3)
enhance(24-36)a_enhance(38-50)get_name(53-55)
rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (1)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
BaseVulnerability(12-54)
rogue/evaluator_agent/__init__.py (2)
rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (1)
RedTeamA2AEvaluatorAgent(113-421)rogue/evaluator_agent/red_team_mcp_evaluator_agent.py (1)
RedTeamMCPEvaluatorAgent(113-421)
rogue/evaluator_agent/evaluator_agent_factory.py (6)
sdks/python/rogue_sdk/types.py (4)
EvaluationMode(58-62)Protocol(75-86)Scenarios(206-224)Transport(89-100)rogue/evaluator_agent/a2a/a2a_evaluator_agent.py (1)
A2AEvaluatorAgent(19-213)rogue/evaluator_agent/base_evaluator_agent.py (1)
BaseEvaluatorAgent(157-605)rogue/evaluator_agent/mcp/mcp_evaluator_agent.py (1)
MCPEvaluatorAgent(14-181)rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (1)
RedTeamA2AEvaluatorAgent(113-421)rogue/evaluator_agent/red_team_mcp_evaluator_agent.py (1)
RedTeamMCPEvaluatorAgent(113-421)
rogue/server/red_teaming/vulnerabilities/__init__.py (4)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
BaseVulnerability(12-54)rogue/server/red_teaming/vulnerabilities/excessive_agency.py (2)
ExcessiveAgency(21-54)ExcessiveAgencyType(13-18)rogue/server/red_teaming/vulnerabilities/prompt_leakage.py (2)
PromptLeakage(22-56)PromptLeakageType(13-19)rogue/server/red_teaming/vulnerabilities/robustness.py (2)
Robustness(20-52)RobustnessType(13-17)
rogue/server/red_teaming/vulnerabilities/robustness.py (1)
rogue/server/red_teaming/vulnerabilities/base_vulnerability.py (1)
BaseVulnerability(12-54)
rogue/server/red_teaming/attacks/single_turn/roleplay.py (2)
rogue/server/red_teaming/attacks/single_turn/base_single_turn_attack.py (1)
BaseSingleTurnAttack(10-13)rogue/server/red_teaming/attacks/base_attack.py (3)
enhance(24-36)a_enhance(38-50)get_name(53-55)
rogue/server/services/scenario_evaluation_service.py (3)
sdks/python/rogue_sdk/types.py (6)
AuthType(23-48)EvaluationMode(58-62)EvaluationResults(282-310)Protocol(75-86)Scenarios(206-224)Transport(89-100)rogue/evaluator_agent/run_evaluator_agent.py (2)
run_evaluator_agent(237-265)arun_evaluator_agent(82-234)rogue/server/services/red_team_scenario_generator.py (2)
RedTeamScenarioGenerator(18-187)generate_scenarios(30-87)
rogue/server/core/evaluation_orchestrator.py (3)
sdks/python/rogue_sdk/types.py (6)
AuthType(23-48)EvaluationMode(58-62)EvaluationResults(282-310)Protocol(75-86)Scenarios(206-224)Transport(89-100)rogue/evaluator_agent/run_evaluator_agent.py (2)
run_evaluator_agent(237-265)arun_evaluator_agent(82-234)rogue/server/services/red_team_scenario_generator.py (2)
RedTeamScenarioGenerator(18-187)generate_scenarios(30-87)
rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (12)
sdks/python/rogue_sdk/types.py (2)
Scenarios(206-224)Transport(89-100)rogue/common/agent_model_wrapper.py (1)
get_llm_from_model(11-35)rogue/evaluator_agent/a2a/a2a_evaluator_agent.py (1)
A2AEvaluatorAgent(19-213)rogue/evaluator_agent/base_evaluator_agent.py (8)
get_underlying_agent(197-253)_get_conversation_context_id(560-566)_send_message_to_evaluated_agent(540-557)_log_evaluation(372-534)_before_tool_callback(255-273)_after_tool_callback(275-295)_before_model_callback(297-310)_after_model_callback(312-332)rogue/server/red_teaming/frameworks/owasp/owasp.py (3)
OWASPTop10(14-82)get_categories(80-82)get_name(76-78)rogue/server/red_teaming/attacks/base_attack.py (2)
enhance(24-36)get_name(53-55)rogue/server/red_teaming/attacks/single_turn/base64.py (2)
enhance(20-22)get_name(28-29)rogue/server/red_teaming/attacks/single_turn/leetspeak.py (2)
enhance(18-36)get_name(42-43)rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (2)
enhance(20-27)get_name(59-60)rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (2)
enhance(23-32)get_name(38-39)rogue/server/red_teaming/attacks/single_turn/roleplay.py (2)
enhance(33-42)get_name(48-49)rogue/server/red_teaming/attacks/single_turn/rot13.py (2)
enhance(18-25)get_name(31-32)
rogue/evaluator_agent/red_team_mcp_evaluator_agent.py (13)
sdks/python/rogue_sdk/types.py (2)
Scenarios(206-224)Transport(89-100)rogue/common/agent_model_wrapper.py (1)
get_llm_from_model(11-35)rogue/evaluator_agent/mcp/mcp_evaluator_agent.py (1)
MCPEvaluatorAgent(14-181)rogue/evaluator_agent/red_team_a2a_evaluator_agent.py (8)
get_underlying_agent(169-227)_get_owasp_category_description(229-243)_send_message_to_evaluated_agent(375-393)_log_evaluation(395-421)_extract_owasp_category_from_scenario(245-259)_get_owasp_framework(261-270)_get_attacks_for_category(272-281)_select_and_enhance_attack(283-373)rogue/evaluator_agent/base_evaluator_agent.py (8)
get_underlying_agent(197-253)_get_conversation_context_id(560-566)_send_message_to_evaluated_agent(540-557)_log_evaluation(372-534)_before_tool_callback(255-273)_after_tool_callback(275-295)_before_model_callback(297-310)_after_model_callback(312-332)rogue/server/red_teaming/frameworks/owasp/owasp.py (3)
OWASPTop10(14-82)get_categories(80-82)get_name(76-78)rogue/server/red_teaming/attacks/base_attack.py (2)
enhance(24-36)get_name(53-55)rogue/server/red_teaming/attacks/single_turn/base64.py (2)
enhance(20-22)get_name(28-29)rogue/server/red_teaming/attacks/single_turn/leetspeak.py (2)
enhance(18-36)get_name(42-43)rogue/server/red_teaming/attacks/single_turn/prompt_injection.py (2)
enhance(20-27)get_name(59-60)rogue/server/red_teaming/attacks/single_turn/prompt_probing.py (2)
enhance(23-32)get_name(38-39)rogue/server/red_teaming/attacks/single_turn/roleplay.py (2)
enhance(33-42)get_name(48-49)rogue/server/red_teaming/attacks/single_turn/rot13.py (2)
enhance(18-25)get_name(31-32)
🪛 markdownlint-cli2 (0.18.1)
TESTING_WITH_CURL.md
181-181: Bare URL used
(MD034, no-bare-urls)
182-182: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: rogue_sanity
- GitHub Check: codestyle
| def _log_evaluation( | ||
| self, | ||
| scenario: dict[str, str], | ||
| context_id: str, | ||
| evaluation_passed: bool, | ||
| reason: str, | ||
| **kwargs, | ||
| ) -> None: | ||
| """ | ||
| Log evaluation and track scenario for attack selection. | ||
|
|
||
| Overrides parent to track which scenario is being tested | ||
| for better attack selection. | ||
| """ | ||
| # Track scenario for this context | ||
| scenario_text = scenario.get("scenario", "") | ||
| if scenario_text: | ||
| self._context_to_scenario[context_id] = scenario_text | ||
|
|
||
| # Call parent method | ||
| return super()._log_evaluation( | ||
| scenario, | ||
| context_id, | ||
| evaluation_passed, | ||
| reason, | ||
| **kwargs, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align _log_evaluation scenario handling with base class to avoid crashes
Same issue as in the MCP variant: this override assumes scenario is a dict and calls .get, but base_evaluator_agent._log_evaluation is explicitly written to tolerate strings and other shapes from the LLM. If a string comes through here, you’ll get an AttributeError before the base class can normalize it.
You can fix this by normalizing scenario before using it:
def _log_evaluation(
self,
- scenario: dict[str, str],
+ scenario: dict[str, str], # Base implementation also accepts str at runtime
context_id: str,
evaluation_passed: bool,
reason: str,
**kwargs,
) -> None:
@@
- # Track scenario for this context
- scenario_text = scenario.get("scenario", "")
- if scenario_text:
- self._context_to_scenario[context_id] = scenario_text
+ # Track scenario for this context (handle both dict and str inputs safely)
+ scenario_text = ""
+ if isinstance(scenario, dict):
+ scenario_text = scenario.get("scenario", "")
+ elif isinstance(scenario, str):
+ scenario_text = scenario
+ if scenario_text:
+ self._context_to_scenario[context_id] = scenario_textThen delegate to super()._log_evaluation(...) as you already do.
🤖 Prompt for AI Agents
In rogue/evaluator_agent/red_team_a2a_evaluator_agent.py around lines 395 to
421, the override of _log_evaluation assumes scenario is a dict and calls .get,
which will raise AttributeError if scenario is a string or other shape;
normalize scenario first (e.g., if isinstance(scenario, dict) use
scenario.get("scenario", "") else treat scenario as the scenario text or wrap it
into a dict with key "scenario") to extract scenario_text safely, set
self._context_to_scenario[context_id] only when scenario_text is non-empty, then
delegate to super()._log_evaluation(...) unchanged.
| if types is None: | ||
| enum_types = list(ExcessiveAgencyType) | ||
| else: | ||
| # Convert string values to enum types | ||
| enum_types = [ | ||
| ExcessiveAgencyType(t) if isinstance(t, str) else t | ||
| for t in types | ||
| if isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType] | ||
| ] | ||
|
|
||
| super().__init__(types=enum_types) # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix filtering logic to handle both string and enum inputs correctly.
The list comprehension on lines 48-52 has a subtle bug: it converts strings to enums (or passes through non-strings) on line 49, but then filters on line 51 to only keep items where isinstance(t, str). This means if a caller passes ExcessiveAgencyType.PERMISSIONS directly (an enum, not a string), it will be filtered out incorrectly.
Apply this diff to handle both string and enum inputs:
- else:
- # Convert string values to enum types
- enum_types = [
- ExcessiveAgencyType(t) if isinstance(t, str) else t
- for t in types
- if isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType]
- ]
+ else:
+ # Convert string values to enum types and filter valid enums
+ enum_types = []
+ for t in types:
+ if isinstance(t, ExcessiveAgencyType):
+ enum_types.append(t)
+ elif isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType]:
+ enum_types.append(ExcessiveAgencyType(t))Alternatively, use a more concise approach with error handling:
- else:
- # Convert string values to enum types
- enum_types = [
- ExcessiveAgencyType(t) if isinstance(t, str) else t
- for t in types
- if isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType]
- ]
+ else:
+ # Convert string values to enum types, keep valid enums, skip invalid
+ enum_types = []
+ for t in types:
+ if isinstance(t, ExcessiveAgencyType):
+ enum_types.append(t)
+ elif isinstance(t, str):
+ try:
+ enum_types.append(ExcessiveAgencyType(t))
+ except ValueError:
+ pass # Skip invalid string valuesNote: This same pattern appears in prompt_leakage.py (lines 48-54) and likely other vulnerability files—apply the same fix consistently across all vulnerability classes.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if types is None: | |
| enum_types = list(ExcessiveAgencyType) | |
| else: | |
| # Convert string values to enum types | |
| enum_types = [ | |
| ExcessiveAgencyType(t) if isinstance(t, str) else t | |
| for t in types | |
| if isinstance(t, str) and t in [e.value for e in ExcessiveAgencyType] | |
| ] | |
| super().__init__(types=enum_types) # type: ignore[arg-type] | |
| if types is None: | |
| enum_types = list(ExcessiveAgencyType) | |
| else: | |
| # Convert string values to enum types, keep valid enums, skip invalid | |
| enum_types = [] | |
| for t in types: | |
| if isinstance(t, ExcessiveAgencyType): | |
| enum_types.append(t) | |
| elif isinstance(t, str): | |
| try: | |
| enum_types.append(ExcessiveAgencyType(t)) | |
| except ValueError: | |
| pass # Skip invalid string values | |
| super().__init__(types=enum_types) # type: ignore[arg-type] |
🤖 Prompt for AI Agents
In rogue/server/red_teaming/vulnerabilities/excessive_agency.py around lines 44
to 54, the list comprehension currently converts strings to enums but then
filters to only keep strings, which incorrectly drops inputs already provided as
ExcessiveAgencyType enums; replace the comprehension with logic that keeps enum
instances as-is and converts valid string values to enums (or use a try/except
around ExcessiveAgencyType(t) to coerce strings and skip invalid values), then
pass the resulting enum_types to super().__init__; apply the same fix to the
analogous code in prompt_leakage.py (lines ~48-54) and any other vulnerability
files using this pattern.
| - Over-relies on user input without validation | ||
| """ | ||
|
|
||
| name = "Robustness" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type annotation to class attribute.
Per coding guidelines, all class attributes should have type annotations.
As per coding guidelines.
Apply this diff:
- name = "Robustness"
+ name: str = "Robustness"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| name = "Robustness" | |
| name: str = "Robustness" |
🤖 Prompt for AI Agents
In rogue/server/red_teaming/vulnerabilities/robustness.py around line 29, the
class attribute "name = \"Robustness\"" lacks a type annotation; add an explicit
type annotation (e.g., change to name: str = "Robustness") so the class
attribute is typed per coding guidelines.
7ea29b5 to
2d58185
Compare
yuval-qf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments for your consideration
| // Bedrock-specific fields | ||
| AWSAccessKeyInput string | ||
| AWSAccessKeyCursor int | ||
| AWSSecretKeyInput string | ||
| AWSSecretKeyCursor int | ||
| AWSRegionInput string | ||
| AWSRegionCursor int | ||
| ActiveInputField int // 0=APIKey, 1=AWSAccessKey, 2=AWSSecretKey, 3=AWSRegion (for Bedrock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add aws profile in case the user connects to aws using aws sso login?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/tui/internal/components/llm_config_dialog.go (1)
140-147: Bedrock credentials still omit AWS session token, breaking common STS/SSO flows.Bedrock wiring adds access key, secret, and region fields to
LLMConfigDialogandLLMConfigResultMsg, and forwards them in the Bedrock branches ofhandleEnter, but there’s still noAWS_SESSION_TOKENsupport. For temporary credentials (the default with STS/SSO), calls will fail without the session token, so Bedrock remains unusable for a large class of setups. This was flagged in a previous review and is still outstanding.At minimum, extend the dialog state and result message and thread the field through the Bedrock paths in this file; downstream storage/clients can then forward the full set:
type LLMConfigDialog struct { @@ - AWSRegionInput string - AWSRegionCursor int - ActiveInputField int // 0=APIKey, 1=AWSAccessKey, 2=AWSSecretKey, 3=AWSRegion (for Bedrock) + AWSRegionInput string + AWSRegionCursor int + AWSSessionTokenInput string + AWSSessionTokenCursor int + ActiveInputField int // 0=AWSAccessKey, 1=AWSSecretKey, 2=AWSRegion, 3=AWSSessionToken (for Bedrock)type LLMConfigResultMsg struct { Provider string APIKey string AWSAccessKeyID string AWSSecretAccessKey string AWSRegion string + AWSSessionToken string Model string Action string }- if provider.Name == "bedrock" { - msg.APIKey = d.AWSAccessKeyInput - msg.AWSAccessKeyID = d.AWSAccessKeyInput - msg.AWSSecretAccessKey = d.AWSSecretKeyInput - msg.AWSRegion = d.AWSRegionInput - } + if provider.Name == "bedrock" { + msg.APIKey = d.AWSAccessKeyInput + msg.AWSAccessKeyID = d.AWSAccessKeyInput + msg.AWSSecretAccessKey = d.AWSSecretKeyInput + msg.AWSRegion = d.AWSRegionInput + msg.AWSSessionToken = d.AWSSessionTokenInput + }and mirror the same addition in the ModelSelectionStep Bedrock block. The new session‑token field should also be integrated into the Bedrock input navigation/backspace/delete/paste/render logic alongside the existing three Bedrock fields.
Also applies to: 162-167, 692-697, 728-733
🧹 Nitpick comments (1)
packages/tui/internal/components/llm_config_dialog.go (1)
207-237: Bedrock provider definition and multi-field input UX look solid overall.The Bedrock provider entry, three-field AWS input layout, keyboard navigation (↑/↓ between fields, ←/→ cursor movement, focus handoff to buttons), and per-field backspace/delete/paste handling are coherent and match the behavior of the single-field providers. The generic
renderInputFieldandrenderAPIKeyInputsplit keeps the rendering reasonably clean.If you touch this again, consider factoring the repeated
if provider.Name == "bedrock" { ... } else { ... }branches into small helpers (e.g.,handleBedrockChar,handleBedrockDelete,renderBedrockInputs) to reduce duplication and make it easier to add the session-token field without editing multiple switch statements.Also applies to: 350-379, 385-415, 433-445, 465-480, 488-515, 520-543, 548-589, 746-775, 1001-1069, 1071-1163
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/tui/internal/components/llm_config_dialog.go(17 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: rogue_sanity
- GitHub Check: codestyle
| // Validate inputs based on provider | ||
| provider := d.Providers[d.SelectedProvider] | ||
| if provider.Name != "bedrock" || provider.Name != "lm_studio" && d.APIKeyInput == "" { | ||
| d.ErrorMessage = "API key cannot be empty" | ||
| return d, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix API key validation condition; it currently always trips and blocks configuration.
The condition:
if provider.Name != "bedrock" || provider.Name != "lm_studio" && d.APIKeyInput == "" {is effectively always true for non‑Bedrock providers and still validates against APIKeyInput for Bedrock, so the “Validate” path can never succeed for any provider and Bedrock can’t use its dedicated fields.
Recommend rewriting to validate per provider explicitly:
- // Validate inputs based on provider
- provider := d.Providers[d.SelectedProvider]
- if provider.Name != "bedrock" || provider.Name != "lm_studio" && d.APIKeyInput == "" {
- d.ErrorMessage = "API key cannot be empty"
- return d, nil
- }
+ // Validate inputs based on provider
+ provider := d.Providers[d.SelectedProvider]
+ switch provider.Name {
+ case "bedrock":
+ if d.AWSAccessKeyInput == "" || d.AWSSecretKeyInput == "" || d.AWSRegionInput == "" {
+ d.ErrorMessage = "AWS access key, secret key, and region cannot be empty"
+ return d, nil
+ }
+ case "lm_studio":
+ // lm_studio does not require an API key
+ default:
+ if d.APIKeyInput == "" {
+ d.ErrorMessage = "API key cannot be empty"
+ return d, nil
+ }
+ }🤖 Prompt for AI Agents
In packages/tui/internal/components/llm_config_dialog.go around lines 668 to
673, the combined boolean expression incorrectly uses || and && so it always
evaluates true and blocks validation; replace it with explicit per-provider
checks: if provider.Name == "bedrock" validate bedrock-specific fields (not
APIKeyInput), else if provider.Name == "lm_studio" validate lm_studio-specific
requirements, otherwise ensure d.APIKeyInput is not empty. Implement clear
separate branches for each provider (or use proper parentheses) so only the
relevant fields are validated for each provider.
Description
Motivation and Context
Type of Change
Changes Made
Screenshots/Examples (if applicable)
Checklist
uv run black .to format my codeuv run flake8 .and fixed all issuesuv run mypy --config-file .mypy.ini .and addressed type checking issuesuv run bandit -c .bandit.yaml -r .for security checksuv run pytestand all tests passTesting
Test Configuration:
Test Steps:
1.
2.
3.
Additional Notes
Related Issues/PRs