feat: add remote garak integration tests by kpunwatk · Pull Request #1132 · opendatahub-io/opendatahub-tests

kpunwatk · 2026-02-19T11:09:15Z

Description

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

New Features
- Added Garak as a selectable remote evaluation provider with provider-aware defaults and conditional Kubeflow activation.
- Allow in-cluster requests to reach LlamaStack deployments.
Tests
- Added end-to-end tests for the Garak remote evaluation provider (quick, custom benchmark, and shield scan flows).
- Improved test helpers: more robust job polling, clearer failure reporting, and result/metadata validation utilities.

github-actions · 2026-02-19T11:09:36Z

The following are automatically added/executed:

PR size label.
Run pre-commit
Run tox
Add PR author as the PR assignee
Build image based on the PR

Available user actions:

To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
lgtm label removed on each new commit push.
To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
verified label removed on each new commit push.
To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.

Supported labels

{'/verified', '/lgtm', '/cherry-pick', '/wip', '/hold', '/build-push-pr-image'}

kpunwatk · 2026-02-19T11:09:48Z

/wip

coderabbitai · 2026-02-19T11:11:47Z

📝 Walkthrough

Walkthrough

Adds Garak-specific eval support and provider-aware Kubeflow test wiring: new Garak tests and constants, expanded Kubeflow fixture logic and env var selection, HTTP service URL fixtures, result/metadata validators, minor distribution/network tweaks, and an HTTPX dependency addition. Report is factual only.

Changes

Cohort / File(s)	Summary
Kubeflow / test fixtures `tests/llama_stack/conftest.py`	Reworks Kubeflow gating: derives `enable_ragas_remote`, `enable_garak_remote`, `enable_kubeflow_eval`; sets `ENABLE_KUBEFLOW_GARAK` when applicable; uses generated CR name (`cr_name`) for `KUBEFLOW_LLAMA_STACK_URL`; selects provider-aware base image env var (`KUBEFLOW_GARAK_BASE_IMAGE` or `KUBEFLOW_BASE_IMAGE`); provider-specific `KUBEFLOW_RESULTS_S3_PREFIX`; passes `_cr_name` through `server_config` and pops it later. Review env handling for secrets/tokens (see Security notes).
Eval service URL fixtures `tests/llama_stack/eval/conftest.py`	Adds `qwen_isvc_service_url` and `guardrails_orchestrator_service_url` fixtures that query Kubernetes resources, poll for services, and return in-cluster HTTP URLs; includes port discovery and timeout sampling logic. Validate robust error paths and resource permission assumptions.
Garak provider tests `tests/llama_stack/eval/test_garak_provider.py`	New test module with three parametrized test classes exercising Garak quick benchmark, custom benchmark, and shield scan flows: benchmark/shield register (idempotent handling), run evals, poll completion, retrieve and validate results and metadata. These are end-to-end/integration tests relying on MinIO and cluster services.
Eval constants `tests/llama_stack/eval/constants.py`	Adds `LLAMA_STACK_DISTRIBUTION_IMAGE`, Garak benchmark IDs, timeouts, provider-qualified model ID, and `GARAK_SHIELD_ID` constants.
Global providers constants `tests/llama_stack/constants.py`	Adds new enum member `TRUSTYAI_GARAK_REMOTE = "trustyai_garak_remote"` to `LlamaStackProviders.Eval`.
Eval utils & validators `tests/llama_stack/eval/utils.py`	Adds logging, refactors `wait_for_eval_job_completion` to log status and improve failure messages; adds `validate_eval_result_structure` and `validate_job_metadata` helpers; tightens type hints. Review raised RuntimeError messages for sensitive data leakage.
Distribution/network tweak `tests/llama_stack/utils.py`	Expands `network.allowedFrom.namespaces` to include deployment namespace in LlamaStack distribution context manager.
Project deps `pyproject.toml`	Adds dependency `httpx[http2]>=0.28.1`. Review for transitive dependency impacts and CVE status. Also confirm HTTP/2 usage is intentional for in-cluster calls.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Security notes (actionable only, no praise):

Environment variables and tokens are set and passed through test fixtures (e.g., KUBEFLOW_PIPELINES_TOKEN, S3 creds). Ensure secrets are not logged or embedded in raised exceptions; avoid including tokens in RuntimeError texts — CWE-532 / CWE-523.
Tests construct HTTP URLs to in-cluster services and may include service tokens; validate TLS usage where appropriate and limit exposure of sensitive endpoints (CWE-295).
New dependency httpx[http2] should be scanned for known CVEs and pinned per policy; confirm minimal required feature set to reduce attack surface.

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is largely incomplete; it contains only template placeholders with no substantive content filling the Description, How Has This Been Tested, or Merge Criteria sections.	Add detailed description of changes, testing methodology, environment details, and verify all merge criteria checklist items before merging; commit messages indicate multiple testing tiers and fixture changes that require documentation.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add remote garak integration tests' clearly and specifically describes the main change—adding integration tests for the remote Garak provider.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch garak_inline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

tests/llama_stack/conftest.py (1)
286-287: Garak image uses unpinned :latest tag; Ragas image is pinned by SHA

default_garak_image (Line 286) uses quay.io/trustyai/garak-remote-provider:latest, while default_ragas_image (Line 287) is pinned with a SHA digest. Using :latest makes test runs non-reproducible and can silently pick up breaking upstream changes.
♻️ Suggested fix — pin Garak image digest
-        default_garak_image = "quay.io/trustyai/garak-remote-provider:latest"
+        default_garak_image = "quay.io/trustyai/garak-remote-provider@sha256:<digest>"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/conftest.py` around lines 286 - 287, default_garak_image
currently uses the unpinned tag "quay.io/trustyai/garak-remote-provider:latest",
which makes tests non-reproducible; change the value of default_garak_image to a
specific digest-pinned image (mirror the approach used by default_ragas_image)
by replacing the :latest tag with the corresponding `@sha256`:<digest> for the
desired Garak image so the tests always pull the exact same image; update the
string assigned to default_garak_image (keeping default_ragas_image as the
example/pattern) and ensure the chosen digest is the tested/approved release.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llama_stack/conftest.py`:
- Around line 275-279: The llama_stack_distribution fixture is missing a local
distribution_name assignment causing URL/CRD name mismatches; add
distribution_name = generate_random_name(prefix="llama-stack-distribution") at
the start of the llama_stack_distribution fixture (before the with block) so the
fixture uses its own random name (matching
unprivileged_llama_stack_distribution) and aligns with how
llama_stack_server_config builds KUBEFLOW_LLAMA_STACK_URL and any created
CRD/Service names.

In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 46-48: The test is assuming the first item in the list is the
newly registered benchmark; instead call
llama_stack_client.alpha.benchmarks.list(), then find/filter the returned list
for an item whose identifier equals GARAK_REMOTE_BENCHMARK_ID (e.g., using a
filter or next comprehension) and assert on that item's provider_id against
LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE; ensure you handle the case where
no matching item is found by asserting the filtered result is not empty before
checking provider_id.
- Around line 32-48: The test function test_garak_remote_register_benchmark is
missing a Given-When-Then docstring and a return type annotation; update the
function signature to include "-> None" and replace the single-sentence
docstring with a three-part Given-When-Then docstring that describes the initial
state (Given), the action performed (When calling
llama_stack_client.alpha.benchmarks.register with GARAK_REMOTE_BENCHMARK_ID and
provider TRUSTYAI_GARAK_REMOTE), and the expected outcome (Then the benchmark
appears in llama_stack_client.alpha.benchmarks.list with matching identifier and
provider_id).
- Around line 50-68: The test body for test_garak_remote_run_eval is
mis-indented so the eval code runs at class scope using the imported
llama_stack_client module instead of the pytest fixture; indent the block
starting with job = llama_stack_client.alpha.eval.run_eval(...) and the
subsequent wait_for_eval_job_completion(...) call so they are inside the
test_garak_remote_run_eval method body, referencing GARAK_REMOTE_BENCHMARK_ID,
QWEN_MODEL_NAME, and wait_for_eval_job_completion to locate the code, and then
remove the now-unused top-level import llama_stack_client (the fixture should be
used instead).
- Around line 27-28: The test uses an unregistered pytest marker
"security_remote" which will trigger PytestUnknownMarkWarning under
--strict-markers; either register "security_remote" in your pytest.ini under the
[pytest] markers list (alongside "rawdeployment") or remove the
`@pytest.mark.security_remote` decorator from
tests/llama_stack/eval/test_garak_provider.py (or replace it with an
already-registered marker) so the marker usage and pytest.ini stay consistent.

---

Nitpick comments:
In `@tests/llama_stack/conftest.py`:
- Around line 286-287: default_garak_image currently uses the unpinned tag
"quay.io/trustyai/garak-remote-provider:latest", which makes tests
non-reproducible; change the value of default_garak_image to a specific
digest-pinned image (mirror the approach used by default_ragas_image) by
replacing the :latest tag with the corresponding `@sha256`:<digest> for the
desired Garak image so the tests always pull the exact same image; update the
string assigned to default_garak_image (keeping default_ragas_image as the
example/pattern) and ensure the chosen digest is the tested/approved release.

tests/llama_stack/conftest.py

tests/llama_stack/eval/test_garak_provider.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/llama_stack/conftest.py (1)

274-279: ⚠️ Potential issue | 🟠 Major

KUBEFLOW_LLAMA_STACK_URL can point to the wrong Service.

Line 275 defaults to "rh-dev", but the admin distribution is created with a random distribution_name (module-level), so the callback URL can target a non‑existent Service. Align the default with the actual distribution name (or pass it explicitly via params).

🛠️ Suggested fix

-        distribution_name = params.get("distribution_name", "rh-dev")
+        dist_name = params.get("distribution_name", distribution_name)
         env_vars.append({
             "name": "KUBEFLOW_LLAMA_STACK_URL",
-            "value": f"http://{distribution_name}-service.{model_namespace.name}.svc.cluster.local:8321",
+            "value": f"http://{dist_name}-service.{model_namespace.name}.svc.cluster.local:8321",
         })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/conftest.py` around lines 274 - 279, The
KUBEFLOW_LLAMA_STACK_URL env var default uses literal "rh-dev" which can point
to the wrong Service; change the logic that sets distribution_name =
params.get("distribution_name", "rh-dev") to use the module-level
distribution_name (the one used when creating the admin distribution) as the
default or require params to provide distribution_name so the URL built for
KUBEFLOW_LLAMA_STACK_URL (the dict with "name" and "value") will reference the
actual Service (use the same distribution_name symbol used elsewhere in the
module and the model_namespace.name to construct the
http://{distribution_name}-service... URL).

🧹 Nitpick comments (1)

tests/llama_stack/conftest.py (1)
285-293: Avoid :latest for the Garak base image to keep CI reproducible.

quay.io/trustyai/garak-remote-provider:latest is mutable and can introduce test flakiness. Consider pinning a digest or promoting the default to a configurable env/constant with a pinned image.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/conftest.py` around lines 285 - 293, The Garak base image
is using the mutable tag default_garak_image =
"quay.io/trustyai/garak-remote-provider:latest", which can cause flaky CI;
update the logic that sets selected_image (using
params.get("kubeflow_base_image") and enable_garak_remote) to use a pinned
digest or configurable constant/env var instead of :latest (e.g., replace
default_garak_image with a digest-pinned string or read from an
environment/config constant) and ensure env_vars.append({"name":
"KUBEFLOW_BASE_IMAGE", "value": selected_image}) continues to receive the pinned
value.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 41-46: The test function test_garak_remote_run_and_retrieve_eval
(and the other test at lines 140-146) declares minio_pod and
minio_data_connection as parameters but only relies on their fixture side
effects; remove those two parameters from the function signature and add a
pytest.mark.usefixtures("minio_pod", "minio_data_connection") decorator above
the test function (and the other affected test) so Ruff no longer reports
unused-argument warnings while preserving the fixtures' side effects.

---

Duplicate comments:
In `@tests/llama_stack/conftest.py`:
- Around line 274-279: The KUBEFLOW_LLAMA_STACK_URL env var default uses literal
"rh-dev" which can point to the wrong Service; change the logic that sets
distribution_name = params.get("distribution_name", "rh-dev") to use the
module-level distribution_name (the one used when creating the admin
distribution) as the default or require params to provide distribution_name so
the URL built for KUBEFLOW_LLAMA_STACK_URL (the dict with "name" and "value")
will reference the actual Service (use the same distribution_name symbol used
elsewhere in the module and the model_namespace.name to construct the
http://{distribution_name}-service... URL).

---

Nitpick comments:
In `@tests/llama_stack/conftest.py`:
- Around line 285-293: The Garak base image is using the mutable tag
default_garak_image = "quay.io/trustyai/garak-remote-provider:latest", which can
cause flaky CI; update the logic that sets selected_image (using
params.get("kubeflow_base_image") and enable_garak_remote) to use a pinned
digest or configurable constant/env var instead of :latest (e.g., replace
default_garak_image with a digest-pinned string or read from an
environment/config constant) and ensure env_vars.append({"name":
"KUBEFLOW_BASE_IMAGE", "value": selected_image}) continues to receive the pinned
value.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3dad24b and 93201b2.

📒 Files selected for processing (3)

tests/llama_stack/conftest.py
tests/llama_stack/constants.py
tests/llama_stack/eval/test_garak_provider.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/llama_stack/constants.py

coderabbitai · 2026-02-24T11:25:41Z

tests/llama_stack/eval/test_garak_provider.py

+    def test_garak_remote_run_and_retrieve_eval(self, minio_pod, minio_data_connection, llama_stack_client):
+        """
+        Given a remote Garak provider with Kubeflow enabled.
+        When a quick Garak benchmark is registered and executed.
+        Then the eval job completes and results are retrievable.
+        """


⚠️ Potential issue | 🟡 Minor

Use @pytest.mark.usefixtures to avoid unused‑argument lint warnings.

Ruff flags minio_pod/minio_data_connection as unused. If they’re only needed for fixture side effects, move them to usefixtures and remove from the signature.

🛠️ Suggested fix

@@ class TestLlamaStackGarakRemoteProvider: @@ - def test_garak_remote_run_and_retrieve_eval(self, minio_pod, minio_data_connection, llama_stack_client): + `@pytest.mark.usefixtures`("minio_pod", "minio_data_connection") + def test_garak_remote_run_and_retrieve_eval(self, llama_stack_client): @@ class TestLlamaStackGarakRemoteShieldScan: @@ - def test_garak_remote_run_with_shield_scan( - self, - current_client_token, - minio_pod, - minio_data_connection, - llama_stack_client, - ): + `@pytest.mark.usefixtures`("minio_pod", "minio_data_connection") + def test_garak_remote_run_with_shield_scan( + self, + current_client_token, + llama_stack_client, + ):

Also applies to: 140-146

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 41-41: Unused method argument: minio_pod

(ARG002)

[warning] 41-41: Unused method argument: minio_data_connection

(ARG002)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/eval/test_garak_provider.py` around lines 41 - 46, The test function test_garak_remote_run_and_retrieve_eval (and the other test at lines 140-146) declares minio_pod and minio_data_connection as parameters but only relies on their fixture side effects; remove those two parameters from the function signature and add a pytest.mark.usefixtures("minio_pod", "minio_data_connection") decorator above the test function (and the other affected test) so Ruff no longer reports unused-argument warnings while preserving the fixtures' side effects.

modified: ../conftest.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

tests/llama_stack/eval/test_garak_provider.py (1)
45-47: ⚠️ Potential issue | 🟡 Minor

Fragile response[0] assertion — not guaranteed to be the newly registered benchmark.

If any benchmarks are pre-registered, response[0] may not be GARAK_REMOTE_BENCHMARK_ID. Filter by identifier instead:
🛡️ Proposed fix
         response = llama_stack_client.alpha.benchmarks.list()
-        assert response[0].identifier == GARAK_REMOTE_BENCHMARK_ID
-        assert response[0].provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE
+        registered = next((b for b in response if b.identifier == GARAK_REMOTE_BENCHMARK_ID), None)
+        assert registered is not None, f"Benchmark {GARAK_REMOTE_BENCHMARK_ID} not found"
+        assert registered.provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/eval/test_garak_provider.py` around lines 45 - 47, The test
currently assumes response[0] is the newly registered benchmark which is
fragile; update the assertion to locate the benchmark by identifier instead:
call llama_stack_client.alpha.benchmarks.list(), find the item where
item.identifier == GARAK_REMOTE_BENCHMARK_ID (or filter the list), then assert
that that item's provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE;
ensure the test fails with a clear message if no matching benchmark is found.
tests/llama_stack/conftest.py (1)
272-277: ⚠️ Potential issue | 🟠 Major

distribution_name mismatch persists between llama_stack_server_config and llama_stack_distribution fixtures.

Line 273 defaults distribution_name to "rh-dev" when not in params, but llama_stack_distribution (line 447) generates a random name. The KUBEFLOW_LLAMA_STACK_URL will point to rh-dev-service while the actual service will be llama-stack-distribution-{random}-service. Kubeflow pipelines will fail to reach the LlamaStack service.

Tests must explicitly pass distribution_name in params matching the value used by llama_stack_distribution, or the fixture architecture needs refactoring to share the generated name.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/llama_stack/conftest.py` around lines 272 - 277, The
KUBEFLOW_LLAMA_STACK_URL in the llama_stack_server_config fixture uses a default
distribution_name "rh-dev" that can diverge from the random name produced by the
llama_stack_distribution fixture; update llama_stack_server_config so it derives
distribution_name from the actual llama_stack_distribution fixture (e.g., use
the generated name from the llama_stack_distribution fixture or ensure
params["distribution_name"] is set to that generated name) and then build
KUBEFLOW_LLAMA_STACK_URL using that shared value; reference distribution_name,
llama_stack_server_config, llama_stack_distribution, and
KUBEFLOW_LLAMA_STACK_URL when locating and changing the code.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 63-67: The test calls wait_for_eval_job_completion but never
asserts the job outcome; update the test to fetch the finished evaluation via
the client (e.g., use llama_stack_client.get_eval_job or
llama_stack_client.get_eval_results for the job_id returned by job.job_id) and
assert the job status is successful and that results contain expected fields
(presence of per-metric scores and an overall score) and non-empty values;
reference the existing wait_for_eval_job_completion and job.job_id to locate
where to add these assertions and fail the test if status != "COMPLETED" or
scores are missing.
- Around line 10-25: The parametrized test enabling enable_garak_remote (which
sets KUBEFLOW_LLAMA_STACK_URL using the default "rh-dev" distribution) will fail
because llama_stack_distribution creates a CRD with a random name; fix by adding
a deterministic "distribution_name": "llama-stack-distribution-garak-test" to
the pytest.param for
model_namespace/minio_pod/minio_data_connection/llama_stack_server_config so the
Kubeflow pipeline callback URL matches the created CRD, and ensure the
llama_stack_distribution fixture reads and uses that distribution_name parameter
when creating the CRD.

---

Duplicate comments:
In `@tests/llama_stack/conftest.py`:
- Around line 272-277: The KUBEFLOW_LLAMA_STACK_URL in the
llama_stack_server_config fixture uses a default distribution_name "rh-dev" that
can diverge from the random name produced by the llama_stack_distribution
fixture; update llama_stack_server_config so it derives distribution_name from
the actual llama_stack_distribution fixture (e.g., use the generated name from
the llama_stack_distribution fixture or ensure params["distribution_name"] is
set to that generated name) and then build KUBEFLOW_LLAMA_STACK_URL using that
shared value; reference distribution_name, llama_stack_server_config,
llama_stack_distribution, and KUBEFLOW_LLAMA_STACK_URL when locating and
changing the code.

In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 45-47: The test currently assumes response[0] is the newly
registered benchmark which is fragile; update the assertion to locate the
benchmark by identifier instead: call
llama_stack_client.alpha.benchmarks.list(), find the item where item.identifier
== GARAK_REMOTE_BENCHMARK_ID (or filter the list), then assert that that item's
provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE; ensure the test
fails with a clear message if no matching benchmark is found.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: bbfc6a1a-92f5-4964-b102-98e8cf58d0f2

📥 Commits

Reviewing files that changed from the base of the PR and between 93201b2 and 174a2a0.

📒 Files selected for processing (3)

tests/llama_stack/conftest.py
tests/llama_stack/constants.py
tests/llama_stack/eval/test_garak_provider.py

coderabbitai · 2026-03-06T22:23:34Z

tests/llama_stack/eval/test_garak_provider.py

+@pytest.mark.parametrize(
+    "model_namespace, minio_pod, minio_data_connection, llama_stack_server_config",
+    [
+        pytest.param(
+            {"name": "test-garak-remote-security"},
+            MinIo.PodConfig.QWEN_HAP_BPIV2_MINIO_CONFIG,
+            {"bucket": "llms"},
+            {
+                "vllm_url_fixture": "qwen_isvc_url",
+                "inference_model": QWEN_MODEL_NAME,
+                "enable_garak_remote": True,  # Injects ENABLE_KUBEFLOW_GARAK=true
+            },
+        )
+    ],
+    indirect=True,
+)


⚠️ Potential issue | 🟠 Major

Missing distribution_name in parametrization will cause URL mismatch at runtime.

The test enables enable_garak_remote: True (line 20) which injects KUBEFLOW_LLAMA_STACK_URL with the default "rh-dev" distribution name. However, llama_stack_distribution fixture creates the CRD with a random name. The Kubeflow pipeline callback URL will be unreachable.

Add "distribution_name": "llama-stack-distribution-garak-test" to the params and ensure llama_stack_distribution uses this name, or refactor the fixture architecture.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/eval/test_garak_provider.py` around lines 10 - 25, The parametrized test enabling enable_garak_remote (which sets KUBEFLOW_LLAMA_STACK_URL using the default "rh-dev" distribution) will fail because llama_stack_distribution creates a CRD with a random name; fix by adding a deterministic "distribution_name": "llama-stack-distribution-garak-test" to the pytest.param for model_namespace/minio_pod/minio_data_connection/llama_stack_server_config so the Kubeflow pipeline callback URL matches the created CRD, and ensure the llama_stack_distribution fixture reads and uses that distribution_name parameter when creating the CRD.

tests/llama_stack/eval/test_garak_provider.py

Implement comprehensive integration tests for the remote mode of the llama_stack_garak_provider across three tiers: - smoke (TestGarakRemoteQuickScan): predefined quick benchmark registration, eval job submission, status polling, and result retrieval - tier1 (TestGarakRemoteCustomBenchmark): custom benchmark with explicit garak_config metadata, probe selection, and result validation - tier2 (TestGarakRemoteShieldScan): shield registration with FMS guardrails orchestrator, benchmark with shield_ids, and shielded eval execution Key changes: - Support distribution_image override in llama_stack_server_config fixture to use specific LlamaStack 0.5.x images - Pre-generate CR names for consistent LlamaStack service URL construction - Add deployment namespace to NetworkPolicy allowedFrom for KFP pod access - Add guardrails orchestrator service URL fixture for in-cluster communication - Use provider-qualified model IDs (vllm-inference/<model>) for LlamaStack 0.5.x - Add eval job utilities with enhanced status logging and result validation Made-with: Cursor

The guardrails_orchestrator_ssl_cert, guardrails_orchestrator_ssl_cert_secret, and patched_llamastack_deployment_tls_certs fixtures are no longer needed since the shield tests now use verify_ssl=False with the HTTPS route. This resolves the unused-code CI failure. Made-with: Cursor

feat(garak): Add integration tests for Garak remote provider

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/llama_stack/conftest.py`:
- Around line 258-260: The code computes enable_kubeflow_eval from
enable_ragas_remote/enable_garak_remote but still uses the old
enable_kubeflow_ragas flag when determining _cr_name/cr_name, which can leave
cr_name unassigned; update the logic so the Ragas-related flags are normalized
before any use (e.g. set enable_kubeflow_ragas =
params.get("enable_kubeflow_ragas", enable_ragas_remote) or derive _cr_name
using enable_ragas_remote) and ensure _cr_name/cr_name assignment in the same
branching that now uses enable_ragas_remote/enable_garak_remote so the
distribution fixture later (lines ~361-362) always sees a defined cr_name.

In `@tests/llama_stack/eval/conftest.py`:
- Around line 252-266: The code selects an orchestrator service and always
returns a URL prefixed with "http://" even when the chosen port is 443; update
the return logic in the TimeoutSampler block that handles the result of
_find_orchestrator_service (and the _HTTP_PORTS tuple) to choose the scheme
based on http_port (use "https://" when http_port == 443, "http://" otherwise)
and return the appropriately-prefixed URL for svc.name and ns; keep the
service/port discovery in _find_orchestrator_service unchanged, only change how
the final URL string is constructed before returning.

In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 324-329: The test disables TLS verification for the guardrails
orchestrator; update the fixture to use the in-cluster service and re-enable
verification: change the fms_orchestrator_url_fixture value from the external
HTTPS fixture (currently "guardrails_orchestrator_url") to the in-cluster
fixture (e.g. "guardrails_orchestrator_incluster_url" or the new in-cluster
fixture name available in tests) and remove or set any verify_ssl=False flag to
True so TLS verification is enforced; apply the same change to the other
occurrence mentioned (the block around lines 362-368) and keep references to
vllm_url_fixture and enable_garak_remote unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 922ca93e-ebb9-4616-bb5e-ed91393928a4

📥 Commits

Reviewing files that changed from the base of the PR and between 174a2a0 and 58d6c04.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

pyproject.toml
tests/llama_stack/conftest.py
tests/llama_stack/eval/conftest.py
tests/llama_stack/eval/constants.py
tests/llama_stack/eval/test_garak_provider.py
tests/llama_stack/eval/utils.py
tests/llama_stack/utils.py

✅ Files skipped from review due to trivial changes (3)

pyproject.toml
tests/llama_stack/utils.py
tests/llama_stack/eval/constants.py

coderabbitai · 2026-04-01T06:17:21Z

tests/llama_stack/conftest.py

+    enable_ragas_remote = params.get("enable_ragas_remote", False)
+    enable_garak_remote = params.get("enable_garak_remote", False)
+    enable_kubeflow_eval = enable_ragas_remote or enable_garak_remote


⚠️ Potential issue | 🔴 Critical

Normalize the Ragas flags before using cr_name.

This now computes Kubeflow setup from enable_ragas_remote, but _cr_name is still keyed off enable_kubeflow_ragas. With the new flag, Ragas gets a callback URL for cr_name but the distribution fixture later invents a different CR name; with the old flag, Line 362 can raise UnboundLocalError because cr_name was never assigned.

Proposed fix

- enable_ragas_remote = params.get("enable_ragas_remote", False) + enable_ragas_remote = bool( + params.get("enable_ragas_remote", params.get("enable_kubeflow_ragas", False)) + ) enable_garak_remote = params.get("enable_garak_remote", False) enable_kubeflow_eval = enable_ragas_remote or enable_garak_remote @@ - if enable_garak_remote or params.get("enable_kubeflow_ragas"): + if enable_kubeflow_eval: server_config["_cr_name"] = cr_name

Also applies to: 361-362

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/conftest.py` around lines 258 - 260, The code computes enable_kubeflow_eval from enable_ragas_remote/enable_garak_remote but still uses the old enable_kubeflow_ragas flag when determining _cr_name/cr_name, which can leave cr_name unassigned; update the logic so the Ragas-related flags are normalized before any use (e.g. set enable_kubeflow_ragas = params.get("enable_kubeflow_ragas", enable_ragas_remote) or derive _cr_name using enable_ragas_remote) and ensure _cr_name/cr_name assignment in the same branching that now uses enable_ragas_remote/enable_garak_remote so the distribution fixture later (lines ~361-362) always sees a defined cr_name.

coderabbitai · 2026-04-01T06:17:21Z

tests/llama_stack/eval/conftest.py

+    _HTTP_PORTS = (8033, 8034, 80, 443)
+
+    def _find_orchestrator_service() -> tuple[Service, int] | None:
+        for svc in Service.get(client=admin_client, namespace=ns):
+            svc_name = svc.name
+            if svc_name == orch_name or svc_name.startswith(f"{orch_name}-"):
+                for p in svc.instance.spec.ports:
+                    if p.port in _HTTP_PORTS:
+                        return svc, p.port
+        return None
+
+    for result in TimeoutSampler(wait_timeout=120, sleep=5, func=_find_orchestrator_service):
+        if result is not None:
+            svc, http_port = result
+            return f"http://{svc.name}.{ns}.svc.cluster.local:{http_port}"


⚠️ Potential issue | 🟠 Major

Don’t return http:// for port 443.

Line 252 treats 443 as a valid candidate, but Line 266 always formats an http:// URL. If the orchestrator service exposes only 443, callers will speak plain HTTP to a TLS port and fail.

Proposed fix

- _HTTP_PORTS = (8033, 8034, 80, 443) + _HTTP_PORTS = {8033, 8034, 80} + _HTTPS_PORTS = {443} - def _find_orchestrator_service() -> tuple[Service, int] | None: + def _find_orchestrator_service() -> tuple[str, Service, int] | None: for svc in Service.get(client=admin_client, namespace=ns): svc_name = svc.name if svc_name == orch_name or svc_name.startswith(f"{orch_name}-"): for p in svc.instance.spec.ports: if p.port in _HTTP_PORTS: - return svc, p.port + return "http", svc, p.port + if p.port in _HTTPS_PORTS: + return "https", svc, p.port return None for result in TimeoutSampler(wait_timeout=120, sleep=5, func=_find_orchestrator_service): if result is not None: - svc, http_port = result - return f"http://{svc.name}.{ns}.svc.cluster.local:{http_port}" + scheme, svc, port = result + return f"{scheme}://{svc.name}.{ns}.svc.cluster.local:{port}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/eval/conftest.py` around lines 252 - 266, The code selects an orchestrator service and always returns a URL prefixed with "http://" even when the chosen port is 443; update the return logic in the TimeoutSampler block that handles the result of _find_orchestrator_service (and the _HTTP_PORTS tuple) to choose the scheme based on http_port (use "https://" when http_port == 443, "http://" otherwise) and return the appropriately-prefixed URL for svc.name and ns; keep the service/port discovery in _find_orchestrator_service unchanged, only change how the final URL string is constructed before returning.

coderabbitai · 2026-04-01T06:17:21Z

tests/llama_stack/eval/test_garak_provider.py

+            {
+                "vllm_url_fixture": "qwen_isvc_service_url",
+                "inference_model": QWEN_MODEL_NAME,
+                "fms_orchestrator_url_fixture": "guardrails_orchestrator_url",
+                "enable_garak_remote": True,
+                "distribution_image": LLAMA_STACK_DISTRIBUTION_IMAGE,


⚠️ Potential issue | 🟠 Major

Stop disabling TLS verification for the guardrails path (CWE-295).

This scenario still points FMS_ORCHESTRATOR_URL at the HTTPS route and then sets verify_ssl=False. That lets any in-cluster MITM or poisoned route cert impersonate the orchestrator and makes the test miss trust-chain regressions. Use the new in-cluster service fixture here and keep verification enabled.

Proposed fix

- "fms_orchestrator_url_fixture": "guardrails_orchestrator_url", + "fms_orchestrator_url_fixture": "guardrails_orchestrator_service_url", @@ - "verify_ssl": False, + "verify_ssl": True,

As per coding guidelines, "REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code)".

Also applies to: 362-368

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/eval/test_garak_provider.py` around lines 324 - 329, The test disables TLS verification for the guardrails orchestrator; update the fixture to use the in-cluster service and re-enable verification: change the fms_orchestrator_url_fixture value from the external HTTPS fixture (currently "guardrails_orchestrator_url") to the in-cluster fixture (e.g. "guardrails_orchestrator_incluster_url" or the new in-cluster fixture name available in tests) and remove or set any verify_ssl=False flag to True so TLS verification is enforced; apply the same change to the other occurrence mentioned (the block around lines 362-368) and keep references to vllm_url_fixture and enable_garak_remote unchanged.

kpunwatk requested a review from a team as a code owner February 19, 2026 11:09

github-actions bot assigned kpunwatk Feb 19, 2026

github-actions bot added the size/l label Feb 19, 2026

rhods-ci-bot added the wip label Feb 19, 2026

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

kpunwatk force-pushed the garak_inline branch 2 times, most recently from 97098c8 to c04e770 Compare February 19, 2026 11:30

rhods-ci-bot added commented-by-dbasunag and removed commented-by-dbasunag labels Feb 23, 2026

kpunwatk assigned saichandrapandraju Feb 24, 2026

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

fix indentation

0309598

modified: ../conftest.py

kpunwatk force-pushed the garak_inline branch from 93201b2 to 0309598 Compare March 6, 2026 22:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

174a2a0

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

saichandrapandraju and others added 4 commits March 7, 2026 21:12

Replace default Garak image with konflux onboarded rhoai-3.4-ea.1 tag.

531c3fd

Merge pull request #2 from saichandrapandraju/pr-1132

58d6c04

feat(garak): Add integration tests for Garak remote provider

github-actions bot added size/xl and removed size/l labels Apr 1, 2026

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

Conversation

kpunwatk commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

kpunwatk commented Feb 19, 2026

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kpunwatk commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading