feat: add tests for guardrails orchestrator autoconfig#611
feat: add tests for guardrails orchestrator autoconfig#611dbasunag merged 2 commits intoopendatahub-io:mainfrom
Conversation
📝 WalkthroughWalkthroughUpdated a dependency and refactored test code: changed an Openshift dependency version; changed the guardrails orchestrator fixture API and parameter handling; updated several guardrails-related test constants, prompts, and helpers; added AutoConfig/gateway tests; and adjusted a llama_stack test to pass an orchestrator_config flag. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/build-push-pr-image', '/cherry-pick', '/verified', '/lgtm', '/wip', '/hold'} |
cfa556d to
aa2f3c8
Compare
1971f9f to
b5a134e
Compare
b5a134e to
443563c
Compare
|
/verified |
aa89e97 to
cfc260e
Compare
cfc260e to
1599612
Compare
|
/verified |
33abd93 to
1599612
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/model_explainability/guardrails/constants.py (2)
3-6: Clarify test-only intent for unsafe prompts; AUTOCONFIG_DETECTOR_LABEL verified
- Add an inline comment above the constants in tests/model_explainability/guardrails/constants.py, e.g.:
"# Test-only prompts to exercise Guardrails detectors (PII/prompt-injection/HAP). Intentionally contain unsafe content."- AUTOCONFIG_DETECTOR_LABEL is imported/used in tests/model_explainability/guardrails/conftest.py and tests/model_explainability/guardrails/test_guardrails.py — single canonical label found; no inconsistencies detected.
31-37: Prefer None over empty string for unknown detection_textGuardrailsDetectionPrompt.detection_text is Optional[str] and a repo search found no code comparing against "" — use None to signal "not available" and avoid ambiguity.
File: tests/model_explainability/guardrails/constants.py (PII_OUTPUT_DETECTION_PROMPT)
PII_OUTPUT_DETECTION_PROMPT: GuardrailsDetectionPrompt = GuardrailsDetectionPrompt( content="Output example email address, nothing else.", detector_id="regex", detection_name="email_address", detection_type="pii", - detection_text="", + detection_text=None, )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/model_explainability/guardrails/constants.py(1 hunks)
🔇 Additional comments (4)
tests/model_explainability/guardrails/constants.py (4)
11-14: Docstring addition LGTMClear purpose and scope for the container object.
23-29: PII input detection prompt is correct — keep detection_type = "pii"Tests assert detection_type == "pii" (tests/llama_stack/safety/test_trustyai_fms_provider.py:100), so the lowercase value matches existing tests; no change needed.
16-20: Make the prompt container immutable and slottedpyproject.toml requires-python = "==3.13.*", so dataclass(frozen=True, slots=True) and PEP 604 '|' unions are supported — apply this diff.
-@dataclass +@dataclass(frozen=True, slots=True) class GuardrailsDetectionPrompt:
47-53: HAP classification: set detection_text to the expected label or None (do not mirror the full prompt)Tests call verify_builtin_detector_unsuitable_input_response(prompt.detection_text); verify_detection asserts exact equality when detection_text is truthy, otherwise it only checks for a non-empty detection text. Choose one:
- If the HAP detector returns a specific label, set detection_text to that label (e.g., "LABEL_1").
- If you only want to assert presence (not exact match), set detection_text=None (this relaxes the test to require non-empty text).
Location: tests/model_explainability/guardrails/constants.py (HAP_INPUT_DETECTION_PROMPT) — verify behavior in tests/model_explainability/guardrails/utils.py -> verify_builtin_detector_unsuitable_input_response / verify_detection.
Suggested diff (if you want relaxed check):
- detection_text=HAP_PROMPT, + detection_text=None,
| PROMPT_INJECTION_INPUT_DETECTION_PROMPT: GuardrailsDetectionPrompt = GuardrailsDetectionPrompt( | ||
| content=PROMPT_INJECTION_PROMPT, | ||
| detector_id="prompt-injection-detector", | ||
| detection_name="detection", | ||
| detection_type="INJECTION", | ||
| detection_text=PROMPT_INJECTION_PROMPT, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
Confirm expected field values for prompt-injection detector
Double-check detector_id="prompt-injection-detector", detection_name="detection", and detection_type="INJECTION" match the GuardrailsOrchestrator outputs for the wrapper version pinned in this PR.
🌐 Web query:
GuardrailsOrchestrator prompt-injection detector: what are the expected values for detector_id, detection_name, and detection_type returned by the latest openshift-python-wrapper?
💡 Result:
Expected values (from the OpenShift AI / Guardrails Orchestrator docs):
- detector_id: "hap" [1].
- detection_name: "sequence_classifier" (appears as the "detection" field in examples). [1][2]
- detection_type: "sequence_classification". [1][2]
Sources: Red Hat OpenShift AI Guardrails Orchestrator documentation. [1] [2]
Update Guardrails detector constants to documented Orchestrator values
Replace detector_id="prompt-injection-detector" → "hap", detection_name="detection" → "sequence_classifier", and detection_type="INJECTION" → "sequence_classification" in tests/model_explainability/guardrails/constants.py (lines 39–45) so they match the OpenShift AI Guardrails Orchestrator docs.
🤖 Prompt for AI Agents
In tests/model_explainability/guardrails/constants.py around lines 39 to 45, the
GuardrailsDetectionPrompt constants use outdated values; update detector_id from
"prompt-injection-detector" to "hap", detection_name from "detection" to
"sequence_classifier", and detection_type from "INJECTION" to
"sequence_classification" so the test constants match the OpenShift AI
Guardrails Orchestrator documented values.
dbasunag
left a comment
There was a problem hiding this comment.
Let's take it. I will update the packages after it is merged.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
tests/model_explainability/guardrails/constants.py (6)
3-6: Confirm selector label and guard sensitive prompts from logs.
- Verify AUTOCONFIG_DETECTOR_LABEL exactly matches the label the Orchestrator watches.
- Good switch to [email protected] (RFC 2606-safe).
- Consider ensuring PROMPT_INJECTION_PROMPT/HAP_PROMPT are not emitted to CI logs at info level.
11-15: Clarify detection_text semantics in the docstring.Spell out that detection_text should be the detected snippet (e.g., PII span) and should be None for pure classifications to avoid ambiguity.
31-37: Use None instead of empty string for unknown detection_text.Empty string can be misread as “detected empty span.” Prefer None to mean “not applicable/unknown.”
Apply this diff:
- detection_text="", + detection_text=None,
47-53: Classification prompt: detection_text should likely be None; verify detection_type.For HAP, detection_text should not mirror the input text; set to None unless the detector returns a span. Confirm whether detection_type truly returns “LABEL_1” or a semantic label.
Apply this diff:
- detection_text=HAP_PROMPT, + detection_text=None,
16-20: Make the dataclass immutable and tighten types.
- Consider @DataClass(frozen=True, slots=True) to keep these “constants” immutable and lean.
- If values are constrained, prefer Literal/Enum for detection_type and detection_name to prevent drift.
- Confirm repo runs on Python ≥3.10 (PEP 604 unions).
Example (outside this hunk):
from dataclasses import dataclass @dataclass(frozen=True, slots=True) class GuardrailsDetectionPrompt: ...#!/bin/bash # Verify Python version constraints and usages of detection_type/name. fd -a 'pyproject.toml' || true fd -a 'tox.ini' || true rg -nC2 -g '!**/venv/**' -e 'python-version' -e '^requires-python' || true rg -nC2 -e 'detection_type\s*=' tests || true rg -nC2 -e 'detector_id\s*=' tests || true
39-45: Normalize detection_type and detection_name to expected values.Casing and labels differ across prompts (“pii”, “INJECTION”). Align to the detector’s actual enum/strings or define shared constants to avoid brittle tests. Also, “detection” as a detection_name is vague—consider a more specific value if that’s what the detector emits.
#!/bash # Find assertions/expectations that rely on specific detection_type/name values. rg -nC2 -e 'INJECTION|pii|LABEL_1|prompt-injection-detector' tests || true
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/model_explainability/guardrails/constants.py(1 hunks)
🔇 Additional comments (1)
tests/model_explainability/guardrails/constants.py (1)
23-29: PII input prompt looks good; confirm detector_id.LGTM for content and expected detection_text. Please confirm “regex” exactly matches the autoconfig’d detector_id.
|
Status of building tag latest: success. |
…o#611) * feat: add tests for guardrails orchestrator autoconfig * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…o#611) * feat: add tests for guardrails orchestrator autoconfig * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Adds new tests for the GuardrailsOrchestrator autoconfig feature, which radically simplifies the manual configuration needed for the GuardrailsOrchestrator.
Description
How Has This Been Tested?
Running all the relevant tests on PSI
Merge criteria:
Summary by CodeRabbit
Chores
Tests