Skip to content

feat: add remote garak integration tests#1132

Open
kpunwatk wants to merge 6 commits intoopendatahub-io:mainfrom
kpunwatk:garak_inline
Open

feat: add remote garak integration tests#1132
kpunwatk wants to merge 6 commits intoopendatahub-io:mainfrom
kpunwatk:garak_inline

Conversation

@kpunwatk
Copy link
Copy Markdown
Contributor

@kpunwatk kpunwatk commented Feb 19, 2026

Description

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • New Features

    • Added Garak as a selectable remote evaluation provider with provider-aware defaults and conditional Kubeflow activation.
    • Allow in-cluster requests to reach LlamaStack deployments.
  • Tests

    • Added end-to-end tests for the Garak remote evaluation provider (quick, custom benchmark, and shield scan flows).
    • Improved test helpers: more robust job polling, clearer failure reporting, and result/metadata validation utilities.

@kpunwatk kpunwatk requested a review from a team as a code owner February 19, 2026 11:09
@github-actions
Copy link
Copy Markdown

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/verified', '/lgtm', '/cherry-pick', '/wip', '/hold', '/build-push-pr-image'}

@kpunwatk
Copy link
Copy Markdown
Contributor Author

/wip

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 19, 2026

📝 Walkthrough

Walkthrough

Adds Garak-specific eval support and provider-aware Kubeflow test wiring: new Garak tests and constants, expanded Kubeflow fixture logic and env var selection, HTTP service URL fixtures, result/metadata validators, minor distribution/network tweaks, and an HTTPX dependency addition. Report is factual only.

Changes

Cohort / File(s) Summary
Kubeflow / test fixtures
tests/llama_stack/conftest.py
Reworks Kubeflow gating: derives enable_ragas_remote, enable_garak_remote, enable_kubeflow_eval; sets ENABLE_KUBEFLOW_GARAK when applicable; uses generated CR name (cr_name) for KUBEFLOW_LLAMA_STACK_URL; selects provider-aware base image env var (KUBEFLOW_GARAK_BASE_IMAGE or KUBEFLOW_BASE_IMAGE); provider-specific KUBEFLOW_RESULTS_S3_PREFIX; passes _cr_name through server_config and pops it later. Review env handling for secrets/tokens (see Security notes).
Eval service URL fixtures
tests/llama_stack/eval/conftest.py
Adds qwen_isvc_service_url and guardrails_orchestrator_service_url fixtures that query Kubernetes resources, poll for services, and return in-cluster HTTP URLs; includes port discovery and timeout sampling logic. Validate robust error paths and resource permission assumptions.
Garak provider tests
tests/llama_stack/eval/test_garak_provider.py
New test module with three parametrized test classes exercising Garak quick benchmark, custom benchmark, and shield scan flows: benchmark/shield register (idempotent handling), run evals, poll completion, retrieve and validate results and metadata. These are end-to-end/integration tests relying on MinIO and cluster services.
Eval constants
tests/llama_stack/eval/constants.py
Adds LLAMA_STACK_DISTRIBUTION_IMAGE, Garak benchmark IDs, timeouts, provider-qualified model ID, and GARAK_SHIELD_ID constants.
Global providers constants
tests/llama_stack/constants.py
Adds new enum member TRUSTYAI_GARAK_REMOTE = "trustyai_garak_remote" to LlamaStackProviders.Eval.
Eval utils & validators
tests/llama_stack/eval/utils.py
Adds logging, refactors wait_for_eval_job_completion to log status and improve failure messages; adds validate_eval_result_structure and validate_job_metadata helpers; tightens type hints. Review raised RuntimeError messages for sensitive data leakage.
Distribution/network tweak
tests/llama_stack/utils.py
Expands network.allowedFrom.namespaces to include deployment namespace in LlamaStack distribution context manager.
Project deps
pyproject.toml
Adds dependency httpx[http2]>=0.28.1. Review for transitive dependency impacts and CVE status. Also confirm HTTP/2 usage is intentional for in-cluster calls.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Security notes (actionable only, no praise):

  • Environment variables and tokens are set and passed through test fixtures (e.g., KUBEFLOW_PIPELINES_TOKEN, S3 creds). Ensure secrets are not logged or embedded in raised exceptions; avoid including tokens in RuntimeError texts — CWE-532 / CWE-523.
  • Tests construct HTTP URLs to in-cluster services and may include service tokens; validate TLS usage where appropriate and limit exposure of sensitive endpoints (CWE-295).
  • New dependency httpx[http2] should be scanned for known CVEs and pinned per policy; confirm minimal required feature set to reduce attack surface.
🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete; it contains only template placeholders with no substantive content filling the Description, How Has This Been Tested, or Merge Criteria sections. Add detailed description of changes, testing methodology, environment details, and verify all merge criteria checklist items before merging; commit messages indicate multiple testing tiers and fixture changes that require documentation.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add remote garak integration tests' clearly and specifically describes the main change—adding integration tests for the remote Garak provider.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch garak_inline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
tests/llama_stack/conftest.py (1)

286-287: Garak image uses unpinned :latest tag; Ragas image is pinned by SHA

default_garak_image (Line 286) uses quay.io/trustyai/garak-remote-provider:latest, while default_ragas_image (Line 287) is pinned with a SHA digest. Using :latest makes test runs non-reproducible and can silently pick up breaking upstream changes.

♻️ Suggested fix — pin Garak image digest
-        default_garak_image = "quay.io/trustyai/garak-remote-provider:latest"
+        default_garak_image = "quay.io/trustyai/garak-remote-provider@sha256:<digest>"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/conftest.py` around lines 286 - 287, default_garak_image
currently uses the unpinned tag "quay.io/trustyai/garak-remote-provider:latest",
which makes tests non-reproducible; change the value of default_garak_image to a
specific digest-pinned image (mirror the approach used by default_ragas_image)
by replacing the :latest tag with the corresponding `@sha256`:<digest> for the
desired Garak image so the tests always pull the exact same image; update the
string assigned to default_garak_image (keeping default_ragas_image as the
example/pattern) and ensure the chosen digest is the tested/approved release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llama_stack/conftest.py`:
- Around line 275-279: The llama_stack_distribution fixture is missing a local
distribution_name assignment causing URL/CRD name mismatches; add
distribution_name = generate_random_name(prefix="llama-stack-distribution") at
the start of the llama_stack_distribution fixture (before the with block) so the
fixture uses its own random name (matching
unprivileged_llama_stack_distribution) and aligns with how
llama_stack_server_config builds KUBEFLOW_LLAMA_STACK_URL and any created
CRD/Service names.

In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 46-48: The test is assuming the first item in the list is the
newly registered benchmark; instead call
llama_stack_client.alpha.benchmarks.list(), then find/filter the returned list
for an item whose identifier equals GARAK_REMOTE_BENCHMARK_ID (e.g., using a
filter or next comprehension) and assert on that item's provider_id against
LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE; ensure you handle the case where
no matching item is found by asserting the filtered result is not empty before
checking provider_id.
- Around line 32-48: The test function test_garak_remote_register_benchmark is
missing a Given-When-Then docstring and a return type annotation; update the
function signature to include "-> None" and replace the single-sentence
docstring with a three-part Given-When-Then docstring that describes the initial
state (Given), the action performed (When calling
llama_stack_client.alpha.benchmarks.register with GARAK_REMOTE_BENCHMARK_ID and
provider TRUSTYAI_GARAK_REMOTE), and the expected outcome (Then the benchmark
appears in llama_stack_client.alpha.benchmarks.list with matching identifier and
provider_id).
- Around line 50-68: The test body for test_garak_remote_run_eval is
mis-indented so the eval code runs at class scope using the imported
llama_stack_client module instead of the pytest fixture; indent the block
starting with job = llama_stack_client.alpha.eval.run_eval(...) and the
subsequent wait_for_eval_job_completion(...) call so they are inside the
test_garak_remote_run_eval method body, referencing GARAK_REMOTE_BENCHMARK_ID,
QWEN_MODEL_NAME, and wait_for_eval_job_completion to locate the code, and then
remove the now-unused top-level import llama_stack_client (the fixture should be
used instead).
- Around line 27-28: The test uses an unregistered pytest marker
"security_remote" which will trigger PytestUnknownMarkWarning under
--strict-markers; either register "security_remote" in your pytest.ini under the
[pytest] markers list (alongside "rawdeployment") or remove the
`@pytest.mark.security_remote` decorator from
tests/llama_stack/eval/test_garak_provider.py (or replace it with an
already-registered marker) so the marker usage and pytest.ini stay consistent.

---

Nitpick comments:
In `@tests/llama_stack/conftest.py`:
- Around line 286-287: default_garak_image currently uses the unpinned tag
"quay.io/trustyai/garak-remote-provider:latest", which makes tests
non-reproducible; change the value of default_garak_image to a specific
digest-pinned image (mirror the approach used by default_ragas_image) by
replacing the :latest tag with the corresponding `@sha256`:<digest> for the
desired Garak image so the tests always pull the exact same image; update the
string assigned to default_garak_image (keeping default_ragas_image as the
example/pattern) and ensure the chosen digest is the tested/approved release.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/llama_stack/conftest.py (1)

274-279: ⚠️ Potential issue | 🟠 Major

KUBEFLOW_LLAMA_STACK_URL can point to the wrong Service.

Line 275 defaults to "rh-dev", but the admin distribution is created with a random distribution_name (module-level), so the callback URL can target a non‑existent Service. Align the default with the actual distribution name (or pass it explicitly via params).

🛠️ Suggested fix
-        distribution_name = params.get("distribution_name", "rh-dev")
+        dist_name = params.get("distribution_name", distribution_name)
         env_vars.append({
             "name": "KUBEFLOW_LLAMA_STACK_URL",
-            "value": f"http://{distribution_name}-service.{model_namespace.name}.svc.cluster.local:8321",
+            "value": f"http://{dist_name}-service.{model_namespace.name}.svc.cluster.local:8321",
         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/conftest.py` around lines 274 - 279, The
KUBEFLOW_LLAMA_STACK_URL env var default uses literal "rh-dev" which can point
to the wrong Service; change the logic that sets distribution_name =
params.get("distribution_name", "rh-dev") to use the module-level
distribution_name (the one used when creating the admin distribution) as the
default or require params to provide distribution_name so the URL built for
KUBEFLOW_LLAMA_STACK_URL (the dict with "name" and "value") will reference the
actual Service (use the same distribution_name symbol used elsewhere in the
module and the model_namespace.name to construct the
http://{distribution_name}-service... URL).
🧹 Nitpick comments (1)
tests/llama_stack/conftest.py (1)

285-293: Avoid :latest for the Garak base image to keep CI reproducible.

quay.io/trustyai/garak-remote-provider:latest is mutable and can introduce test flakiness. Consider pinning a digest or promoting the default to a configurable env/constant with a pinned image.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/conftest.py` around lines 285 - 293, The Garak base image
is using the mutable tag default_garak_image =
"quay.io/trustyai/garak-remote-provider:latest", which can cause flaky CI;
update the logic that sets selected_image (using
params.get("kubeflow_base_image") and enable_garak_remote) to use a pinned
digest or configurable constant/env var instead of :latest (e.g., replace
default_garak_image with a digest-pinned string or read from an
environment/config constant) and ensure env_vars.append({"name":
"KUBEFLOW_BASE_IMAGE", "value": selected_image}) continues to receive the pinned
value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 41-46: The test function test_garak_remote_run_and_retrieve_eval
(and the other test at lines 140-146) declares minio_pod and
minio_data_connection as parameters but only relies on their fixture side
effects; remove those two parameters from the function signature and add a
pytest.mark.usefixtures("minio_pod", "minio_data_connection") decorator above
the test function (and the other affected test) so Ruff no longer reports
unused-argument warnings while preserving the fixtures' side effects.

---

Duplicate comments:
In `@tests/llama_stack/conftest.py`:
- Around line 274-279: The KUBEFLOW_LLAMA_STACK_URL env var default uses literal
"rh-dev" which can point to the wrong Service; change the logic that sets
distribution_name = params.get("distribution_name", "rh-dev") to use the
module-level distribution_name (the one used when creating the admin
distribution) as the default or require params to provide distribution_name so
the URL built for KUBEFLOW_LLAMA_STACK_URL (the dict with "name" and "value")
will reference the actual Service (use the same distribution_name symbol used
elsewhere in the module and the model_namespace.name to construct the
http://{distribution_name}-service... URL).

---

Nitpick comments:
In `@tests/llama_stack/conftest.py`:
- Around line 285-293: The Garak base image is using the mutable tag
default_garak_image = "quay.io/trustyai/garak-remote-provider:latest", which can
cause flaky CI; update the logic that sets selected_image (using
params.get("kubeflow_base_image") and enable_garak_remote) to use a pinned
digest or configurable constant/env var instead of :latest (e.g., replace
default_garak_image with a digest-pinned string or read from an
environment/config constant) and ensure env_vars.append({"name":
"KUBEFLOW_BASE_IMAGE", "value": selected_image}) continues to receive the pinned
value.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3dad24b and 93201b2.

📒 Files selected for processing (3)
  • tests/llama_stack/conftest.py
  • tests/llama_stack/constants.py
  • tests/llama_stack/eval/test_garak_provider.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/llama_stack/constants.py

Comment on lines +41 to +46
def test_garak_remote_run_and_retrieve_eval(self, minio_pod, minio_data_connection, llama_stack_client):
"""
Given a remote Garak provider with Kubeflow enabled.
When a quick Garak benchmark is registered and executed.
Then the eval job completes and results are retrievable.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use @pytest.mark.usefixtures to avoid unused‑argument lint warnings.

Ruff flags minio_pod/minio_data_connection as unused. If they’re only needed for fixture side effects, move them to usefixtures and remove from the signature.

🛠️ Suggested fix
@@
 class TestLlamaStackGarakRemoteProvider:
@@
-    def test_garak_remote_run_and_retrieve_eval(self, minio_pod, minio_data_connection, llama_stack_client):
+    `@pytest.mark.usefixtures`("minio_pod", "minio_data_connection")
+    def test_garak_remote_run_and_retrieve_eval(self, llama_stack_client):
@@
 class TestLlamaStackGarakRemoteShieldScan:
@@
-    def test_garak_remote_run_with_shield_scan(
-        self,
-        current_client_token,
-        minio_pod,
-        minio_data_connection,
-        llama_stack_client,
-    ):
+    `@pytest.mark.usefixtures`("minio_pod", "minio_data_connection")
+    def test_garak_remote_run_with_shield_scan(
+        self,
+        current_client_token,
+        llama_stack_client,
+    ):

Also applies to: 140-146

🧰 Tools
🪛 Ruff (0.15.2)

[warning] 41-41: Unused method argument: minio_pod

(ARG002)


[warning] 41-41: Unused method argument: minio_data_connection

(ARG002)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/eval/test_garak_provider.py` around lines 41 - 46, The test
function test_garak_remote_run_and_retrieve_eval (and the other test at lines
140-146) declares minio_pod and minio_data_connection as parameters but only
relies on their fixture side effects; remove those two parameters from the
function signature and add a pytest.mark.usefixtures("minio_pod",
"minio_data_connection") decorator above the test function (and the other
affected test) so Ruff no longer reports unused-argument warnings while
preserving the fixtures' side effects.

	modified:   ../conftest.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
tests/llama_stack/eval/test_garak_provider.py (1)

45-47: ⚠️ Potential issue | 🟡 Minor

Fragile response[0] assertion — not guaranteed to be the newly registered benchmark.

If any benchmarks are pre-registered, response[0] may not be GARAK_REMOTE_BENCHMARK_ID. Filter by identifier instead:

🛡️ Proposed fix
         response = llama_stack_client.alpha.benchmarks.list()
-        assert response[0].identifier == GARAK_REMOTE_BENCHMARK_ID
-        assert response[0].provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE
+        registered = next((b for b in response if b.identifier == GARAK_REMOTE_BENCHMARK_ID), None)
+        assert registered is not None, f"Benchmark {GARAK_REMOTE_BENCHMARK_ID} not found"
+        assert registered.provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/eval/test_garak_provider.py` around lines 45 - 47, The test
currently assumes response[0] is the newly registered benchmark which is
fragile; update the assertion to locate the benchmark by identifier instead:
call llama_stack_client.alpha.benchmarks.list(), find the item where
item.identifier == GARAK_REMOTE_BENCHMARK_ID (or filter the list), then assert
that that item's provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE;
ensure the test fails with a clear message if no matching benchmark is found.
tests/llama_stack/conftest.py (1)

272-277: ⚠️ Potential issue | 🟠 Major

distribution_name mismatch persists between llama_stack_server_config and llama_stack_distribution fixtures.

Line 273 defaults distribution_name to "rh-dev" when not in params, but llama_stack_distribution (line 447) generates a random name. The KUBEFLOW_LLAMA_STACK_URL will point to rh-dev-service while the actual service will be llama-stack-distribution-{random}-service. Kubeflow pipelines will fail to reach the LlamaStack service.

Tests must explicitly pass distribution_name in params matching the value used by llama_stack_distribution, or the fixture architecture needs refactoring to share the generated name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/conftest.py` around lines 272 - 277, The
KUBEFLOW_LLAMA_STACK_URL in the llama_stack_server_config fixture uses a default
distribution_name "rh-dev" that can diverge from the random name produced by the
llama_stack_distribution fixture; update llama_stack_server_config so it derives
distribution_name from the actual llama_stack_distribution fixture (e.g., use
the generated name from the llama_stack_distribution fixture or ensure
params["distribution_name"] is set to that generated name) and then build
KUBEFLOW_LLAMA_STACK_URL using that shared value; reference distribution_name,
llama_stack_server_config, llama_stack_distribution, and
KUBEFLOW_LLAMA_STACK_URL when locating and changing the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 63-67: The test calls wait_for_eval_job_completion but never
asserts the job outcome; update the test to fetch the finished evaluation via
the client (e.g., use llama_stack_client.get_eval_job or
llama_stack_client.get_eval_results for the job_id returned by job.job_id) and
assert the job status is successful and that results contain expected fields
(presence of per-metric scores and an overall score) and non-empty values;
reference the existing wait_for_eval_job_completion and job.job_id to locate
where to add these assertions and fail the test if status != "COMPLETED" or
scores are missing.
- Around line 10-25: The parametrized test enabling enable_garak_remote (which
sets KUBEFLOW_LLAMA_STACK_URL using the default "rh-dev" distribution) will fail
because llama_stack_distribution creates a CRD with a random name; fix by adding
a deterministic "distribution_name": "llama-stack-distribution-garak-test" to
the pytest.param for
model_namespace/minio_pod/minio_data_connection/llama_stack_server_config so the
Kubeflow pipeline callback URL matches the created CRD, and ensure the
llama_stack_distribution fixture reads and uses that distribution_name parameter
when creating the CRD.

---

Duplicate comments:
In `@tests/llama_stack/conftest.py`:
- Around line 272-277: The KUBEFLOW_LLAMA_STACK_URL in the
llama_stack_server_config fixture uses a default distribution_name "rh-dev" that
can diverge from the random name produced by the llama_stack_distribution
fixture; update llama_stack_server_config so it derives distribution_name from
the actual llama_stack_distribution fixture (e.g., use the generated name from
the llama_stack_distribution fixture or ensure params["distribution_name"] is
set to that generated name) and then build KUBEFLOW_LLAMA_STACK_URL using that
shared value; reference distribution_name, llama_stack_server_config,
llama_stack_distribution, and KUBEFLOW_LLAMA_STACK_URL when locating and
changing the code.

In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 45-47: The test currently assumes response[0] is the newly
registered benchmark which is fragile; update the assertion to locate the
benchmark by identifier instead: call
llama_stack_client.alpha.benchmarks.list(), find the item where item.identifier
== GARAK_REMOTE_BENCHMARK_ID (or filter the list), then assert that that item's
provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE; ensure the test
fails with a clear message if no matching benchmark is found.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: bbfc6a1a-92f5-4964-b102-98e8cf58d0f2

📥 Commits

Reviewing files that changed from the base of the PR and between 93201b2 and 174a2a0.

📒 Files selected for processing (3)
  • tests/llama_stack/conftest.py
  • tests/llama_stack/constants.py
  • tests/llama_stack/eval/test_garak_provider.py

Comment on lines +10 to +25
@pytest.mark.parametrize(
"model_namespace, minio_pod, minio_data_connection, llama_stack_server_config",
[
pytest.param(
{"name": "test-garak-remote-security"},
MinIo.PodConfig.QWEN_HAP_BPIV2_MINIO_CONFIG,
{"bucket": "llms"},
{
"vllm_url_fixture": "qwen_isvc_url",
"inference_model": QWEN_MODEL_NAME,
"enable_garak_remote": True, # Injects ENABLE_KUBEFLOW_GARAK=true
},
)
],
indirect=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing distribution_name in parametrization will cause URL mismatch at runtime.

The test enables enable_garak_remote: True (line 20) which injects KUBEFLOW_LLAMA_STACK_URL with the default "rh-dev" distribution name. However, llama_stack_distribution fixture creates the CRD with a random name. The Kubeflow pipeline callback URL will be unreachable.

Add "distribution_name": "llama-stack-distribution-garak-test" to the params and ensure llama_stack_distribution uses this name, or refactor the fixture architecture.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/eval/test_garak_provider.py` around lines 10 - 25, The
parametrized test enabling enable_garak_remote (which sets
KUBEFLOW_LLAMA_STACK_URL using the default "rh-dev" distribution) will fail
because llama_stack_distribution creates a CRD with a random name; fix by adding
a deterministic "distribution_name": "llama-stack-distribution-garak-test" to
the pytest.param for
model_namespace/minio_pod/minio_data_connection/llama_stack_server_config so the
Kubeflow pipeline callback URL matches the created CRD, and ensure the
llama_stack_distribution fixture reads and uses that distribution_name parameter
when creating the CRD.

saichandrapandraju and others added 4 commits March 7, 2026 21:12
Implement comprehensive integration tests for the remote mode of the
llama_stack_garak_provider across three tiers:

- smoke (TestGarakRemoteQuickScan): predefined quick benchmark registration,
  eval job submission, status polling, and result retrieval
- tier1 (TestGarakRemoteCustomBenchmark): custom benchmark with explicit
  garak_config metadata, probe selection, and result validation
- tier2 (TestGarakRemoteShieldScan): shield registration with FMS guardrails
  orchestrator, benchmark with shield_ids, and shielded eval execution

Key changes:
- Support distribution_image override in llama_stack_server_config fixture
  to use specific LlamaStack 0.5.x images
- Pre-generate CR names for consistent LlamaStack service URL construction
- Add deployment namespace to NetworkPolicy allowedFrom for KFP pod access
- Add guardrails orchestrator service URL fixture for in-cluster communication
- Use provider-qualified model IDs (vllm-inference/<model>) for LlamaStack 0.5.x
- Add eval job utilities with enhanced status logging and result validation

Made-with: Cursor
The guardrails_orchestrator_ssl_cert, guardrails_orchestrator_ssl_cert_secret,
and patched_llamastack_deployment_tls_certs fixtures are no longer needed since
the shield tests now use verify_ssl=False with the HTTPS route. This resolves
the unused-code CI failure.

Made-with: Cursor
feat(garak): Add integration tests for Garak remote provider
@github-actions github-actions bot added size/xl and removed size/l labels Apr 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llama_stack/conftest.py`:
- Around line 258-260: The code computes enable_kubeflow_eval from
enable_ragas_remote/enable_garak_remote but still uses the old
enable_kubeflow_ragas flag when determining _cr_name/cr_name, which can leave
cr_name unassigned; update the logic so the Ragas-related flags are normalized
before any use (e.g. set enable_kubeflow_ragas =
params.get("enable_kubeflow_ragas", enable_ragas_remote) or derive _cr_name
using enable_ragas_remote) and ensure _cr_name/cr_name assignment in the same
branching that now uses enable_ragas_remote/enable_garak_remote so the
distribution fixture later (lines ~361-362) always sees a defined cr_name.

In `@tests/llama_stack/eval/conftest.py`:
- Around line 252-266: The code selects an orchestrator service and always
returns a URL prefixed with "http://" even when the chosen port is 443; update
the return logic in the TimeoutSampler block that handles the result of
_find_orchestrator_service (and the _HTTP_PORTS tuple) to choose the scheme
based on http_port (use "https://" when http_port == 443, "http://" otherwise)
and return the appropriately-prefixed URL for svc.name and ns; keep the
service/port discovery in _find_orchestrator_service unchanged, only change how
the final URL string is constructed before returning.

In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 324-329: The test disables TLS verification for the guardrails
orchestrator; update the fixture to use the in-cluster service and re-enable
verification: change the fms_orchestrator_url_fixture value from the external
HTTPS fixture (currently "guardrails_orchestrator_url") to the in-cluster
fixture (e.g. "guardrails_orchestrator_incluster_url" or the new in-cluster
fixture name available in tests) and remove or set any verify_ssl=False flag to
True so TLS verification is enforced; apply the same change to the other
occurrence mentioned (the block around lines 362-368) and keep references to
vllm_url_fixture and enable_garak_remote unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 922ca93e-ebb9-4616-bb5e-ed91393928a4

📥 Commits

Reviewing files that changed from the base of the PR and between 174a2a0 and 58d6c04.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • pyproject.toml
  • tests/llama_stack/conftest.py
  • tests/llama_stack/eval/conftest.py
  • tests/llama_stack/eval/constants.py
  • tests/llama_stack/eval/test_garak_provider.py
  • tests/llama_stack/eval/utils.py
  • tests/llama_stack/utils.py
✅ Files skipped from review due to trivial changes (3)
  • pyproject.toml
  • tests/llama_stack/utils.py
  • tests/llama_stack/eval/constants.py

Comment on lines +258 to +260
enable_ragas_remote = params.get("enable_ragas_remote", False)
enable_garak_remote = params.get("enable_garak_remote", False)
enable_kubeflow_eval = enable_ragas_remote or enable_garak_remote
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Normalize the Ragas flags before using cr_name.

This now computes Kubeflow setup from enable_ragas_remote, but _cr_name is still keyed off enable_kubeflow_ragas. With the new flag, Ragas gets a callback URL for cr_name but the distribution fixture later invents a different CR name; with the old flag, Line 362 can raise UnboundLocalError because cr_name was never assigned.

Proposed fix
-    enable_ragas_remote = params.get("enable_ragas_remote", False)
+    enable_ragas_remote = bool(
+        params.get("enable_ragas_remote", params.get("enable_kubeflow_ragas", False))
+    )
     enable_garak_remote = params.get("enable_garak_remote", False)
     enable_kubeflow_eval = enable_ragas_remote or enable_garak_remote
@@
-    if enable_garak_remote or params.get("enable_kubeflow_ragas"):
+    if enable_kubeflow_eval:
         server_config["_cr_name"] = cr_name

Also applies to: 361-362

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/conftest.py` around lines 258 - 260, The code computes
enable_kubeflow_eval from enable_ragas_remote/enable_garak_remote but still uses
the old enable_kubeflow_ragas flag when determining _cr_name/cr_name, which can
leave cr_name unassigned; update the logic so the Ragas-related flags are
normalized before any use (e.g. set enable_kubeflow_ragas =
params.get("enable_kubeflow_ragas", enable_ragas_remote) or derive _cr_name
using enable_ragas_remote) and ensure _cr_name/cr_name assignment in the same
branching that now uses enable_ragas_remote/enable_garak_remote so the
distribution fixture later (lines ~361-362) always sees a defined cr_name.

Comment on lines +252 to +266
_HTTP_PORTS = (8033, 8034, 80, 443)

def _find_orchestrator_service() -> tuple[Service, int] | None:
for svc in Service.get(client=admin_client, namespace=ns):
svc_name = svc.name
if svc_name == orch_name or svc_name.startswith(f"{orch_name}-"):
for p in svc.instance.spec.ports:
if p.port in _HTTP_PORTS:
return svc, p.port
return None

for result in TimeoutSampler(wait_timeout=120, sleep=5, func=_find_orchestrator_service):
if result is not None:
svc, http_port = result
return f"http://{svc.name}.{ns}.svc.cluster.local:{http_port}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t return http:// for port 443.

Line 252 treats 443 as a valid candidate, but Line 266 always formats an http:// URL. If the orchestrator service exposes only 443, callers will speak plain HTTP to a TLS port and fail.

Proposed fix
-    _HTTP_PORTS = (8033, 8034, 80, 443)
+    _HTTP_PORTS = {8033, 8034, 80}
+    _HTTPS_PORTS = {443}
 
-    def _find_orchestrator_service() -> tuple[Service, int] | None:
+    def _find_orchestrator_service() -> tuple[str, Service, int] | None:
         for svc in Service.get(client=admin_client, namespace=ns):
             svc_name = svc.name
             if svc_name == orch_name or svc_name.startswith(f"{orch_name}-"):
                 for p in svc.instance.spec.ports:
                     if p.port in _HTTP_PORTS:
-                        return svc, p.port
+                        return "http", svc, p.port
+                    if p.port in _HTTPS_PORTS:
+                        return "https", svc, p.port
         return None
 
     for result in TimeoutSampler(wait_timeout=120, sleep=5, func=_find_orchestrator_service):
         if result is not None:
-            svc, http_port = result
-            return f"http://{svc.name}.{ns}.svc.cluster.local:{http_port}"
+            scheme, svc, port = result
+            return f"{scheme}://{svc.name}.{ns}.svc.cluster.local:{port}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/eval/conftest.py` around lines 252 - 266, The code selects
an orchestrator service and always returns a URL prefixed with "http://" even
when the chosen port is 443; update the return logic in the TimeoutSampler block
that handles the result of _find_orchestrator_service (and the _HTTP_PORTS
tuple) to choose the scheme based on http_port (use "https://" when http_port ==
443, "http://" otherwise) and return the appropriately-prefixed URL for svc.name
and ns; keep the service/port discovery in _find_orchestrator_service unchanged,
only change how the final URL string is constructed before returning.

Comment on lines +324 to +329
{
"vllm_url_fixture": "qwen_isvc_service_url",
"inference_model": QWEN_MODEL_NAME,
"fms_orchestrator_url_fixture": "guardrails_orchestrator_url",
"enable_garak_remote": True,
"distribution_image": LLAMA_STACK_DISTRIBUTION_IMAGE,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop disabling TLS verification for the guardrails path (CWE-295).

This scenario still points FMS_ORCHESTRATOR_URL at the HTTPS route and then sets verify_ssl=False. That lets any in-cluster MITM or poisoned route cert impersonate the orchestrator and makes the test miss trust-chain regressions. Use the new in-cluster service fixture here and keep verification enabled.

Proposed fix
-                "fms_orchestrator_url_fixture": "guardrails_orchestrator_url",
+                "fms_orchestrator_url_fixture": "guardrails_orchestrator_service_url",
@@
-            "verify_ssl": False,
+            "verify_ssl": True,

As per coding guidelines, "REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code)".

Also applies to: 362-368

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/eval/test_garak_provider.py` around lines 324 - 329, The
test disables TLS verification for the guardrails orchestrator; update the
fixture to use the in-cluster service and re-enable verification: change the
fms_orchestrator_url_fixture value from the external HTTPS fixture (currently
"guardrails_orchestrator_url") to the in-cluster fixture (e.g.
"guardrails_orchestrator_incluster_url" or the new in-cluster fixture name
available in tests) and remove or set any verify_ssl=False flag to True so TLS
verification is enforced; apply the same change to the other occurrence
mentioned (the block around lines 362-368) and keep references to
vllm_url_fixture and enable_garak_remote unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants