feat: add remote garak integration tests#1132
feat: add remote garak integration tests#1132kpunwatk wants to merge 6 commits intoopendatahub-io:mainfrom
Conversation
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/verified', '/lgtm', '/cherry-pick', '/wip', '/hold', '/build-push-pr-image'} |
|
/wip |
📝 WalkthroughWalkthroughAdds Garak-specific eval support and provider-aware Kubeflow test wiring: new Garak tests and constants, expanded Kubeflow fixture logic and env var selection, HTTP service URL fixtures, result/metadata validators, minor distribution/network tweaks, and an HTTPX dependency addition. Report is factual only. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Security notes (actionable only, no praise):
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
tests/llama_stack/conftest.py (1)
286-287: Garak image uses unpinned:latesttag; Ragas image is pinned by SHA
default_garak_image(Line 286) usesquay.io/trustyai/garak-remote-provider:latest, whiledefault_ragas_image(Line 287) is pinned with a SHA digest. Using:latestmakes test runs non-reproducible and can silently pick up breaking upstream changes.♻️ Suggested fix — pin Garak image digest
- default_garak_image = "quay.io/trustyai/garak-remote-provider:latest" + default_garak_image = "quay.io/trustyai/garak-remote-provider@sha256:<digest>"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/conftest.py` around lines 286 - 287, default_garak_image currently uses the unpinned tag "quay.io/trustyai/garak-remote-provider:latest", which makes tests non-reproducible; change the value of default_garak_image to a specific digest-pinned image (mirror the approach used by default_ragas_image) by replacing the :latest tag with the corresponding `@sha256`:<digest> for the desired Garak image so the tests always pull the exact same image; update the string assigned to default_garak_image (keeping default_ragas_image as the example/pattern) and ensure the chosen digest is the tested/approved release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/conftest.py`:
- Around line 275-279: The llama_stack_distribution fixture is missing a local
distribution_name assignment causing URL/CRD name mismatches; add
distribution_name = generate_random_name(prefix="llama-stack-distribution") at
the start of the llama_stack_distribution fixture (before the with block) so the
fixture uses its own random name (matching
unprivileged_llama_stack_distribution) and aligns with how
llama_stack_server_config builds KUBEFLOW_LLAMA_STACK_URL and any created
CRD/Service names.
In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 46-48: The test is assuming the first item in the list is the
newly registered benchmark; instead call
llama_stack_client.alpha.benchmarks.list(), then find/filter the returned list
for an item whose identifier equals GARAK_REMOTE_BENCHMARK_ID (e.g., using a
filter or next comprehension) and assert on that item's provider_id against
LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE; ensure you handle the case where
no matching item is found by asserting the filtered result is not empty before
checking provider_id.
- Around line 32-48: The test function test_garak_remote_register_benchmark is
missing a Given-When-Then docstring and a return type annotation; update the
function signature to include "-> None" and replace the single-sentence
docstring with a three-part Given-When-Then docstring that describes the initial
state (Given), the action performed (When calling
llama_stack_client.alpha.benchmarks.register with GARAK_REMOTE_BENCHMARK_ID and
provider TRUSTYAI_GARAK_REMOTE), and the expected outcome (Then the benchmark
appears in llama_stack_client.alpha.benchmarks.list with matching identifier and
provider_id).
- Around line 50-68: The test body for test_garak_remote_run_eval is
mis-indented so the eval code runs at class scope using the imported
llama_stack_client module instead of the pytest fixture; indent the block
starting with job = llama_stack_client.alpha.eval.run_eval(...) and the
subsequent wait_for_eval_job_completion(...) call so they are inside the
test_garak_remote_run_eval method body, referencing GARAK_REMOTE_BENCHMARK_ID,
QWEN_MODEL_NAME, and wait_for_eval_job_completion to locate the code, and then
remove the now-unused top-level import llama_stack_client (the fixture should be
used instead).
- Around line 27-28: The test uses an unregistered pytest marker
"security_remote" which will trigger PytestUnknownMarkWarning under
--strict-markers; either register "security_remote" in your pytest.ini under the
[pytest] markers list (alongside "rawdeployment") or remove the
`@pytest.mark.security_remote` decorator from
tests/llama_stack/eval/test_garak_provider.py (or replace it with an
already-registered marker) so the marker usage and pytest.ini stay consistent.
---
Nitpick comments:
In `@tests/llama_stack/conftest.py`:
- Around line 286-287: default_garak_image currently uses the unpinned tag
"quay.io/trustyai/garak-remote-provider:latest", which makes tests
non-reproducible; change the value of default_garak_image to a specific
digest-pinned image (mirror the approach used by default_ragas_image) by
replacing the :latest tag with the corresponding `@sha256`:<digest> for the
desired Garak image so the tests always pull the exact same image; update the
string assigned to default_garak_image (keeping default_ragas_image as the
example/pattern) and ensure the chosen digest is the tested/approved release.
97098c8 to
c04e770
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/llama_stack/conftest.py (1)
274-279:⚠️ Potential issue | 🟠 MajorKUBEFLOW_LLAMA_STACK_URL can point to the wrong Service.
Line 275 defaults to
"rh-dev", but the admin distribution is created with a randomdistribution_name(module-level), so the callback URL can target a non‑existent Service. Align the default with the actual distribution name (or pass it explicitly via params).🛠️ Suggested fix
- distribution_name = params.get("distribution_name", "rh-dev") + dist_name = params.get("distribution_name", distribution_name) env_vars.append({ "name": "KUBEFLOW_LLAMA_STACK_URL", - "value": f"http://{distribution_name}-service.{model_namespace.name}.svc.cluster.local:8321", + "value": f"http://{dist_name}-service.{model_namespace.name}.svc.cluster.local:8321", })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/conftest.py` around lines 274 - 279, The KUBEFLOW_LLAMA_STACK_URL env var default uses literal "rh-dev" which can point to the wrong Service; change the logic that sets distribution_name = params.get("distribution_name", "rh-dev") to use the module-level distribution_name (the one used when creating the admin distribution) as the default or require params to provide distribution_name so the URL built for KUBEFLOW_LLAMA_STACK_URL (the dict with "name" and "value") will reference the actual Service (use the same distribution_name symbol used elsewhere in the module and the model_namespace.name to construct the http://{distribution_name}-service... URL).
🧹 Nitpick comments (1)
tests/llama_stack/conftest.py (1)
285-293: Avoid:latestfor the Garak base image to keep CI reproducible.
quay.io/trustyai/garak-remote-provider:latestis mutable and can introduce test flakiness. Consider pinning a digest or promoting the default to a configurable env/constant with a pinned image.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/conftest.py` around lines 285 - 293, The Garak base image is using the mutable tag default_garak_image = "quay.io/trustyai/garak-remote-provider:latest", which can cause flaky CI; update the logic that sets selected_image (using params.get("kubeflow_base_image") and enable_garak_remote) to use a pinned digest or configurable constant/env var instead of :latest (e.g., replace default_garak_image with a digest-pinned string or read from an environment/config constant) and ensure env_vars.append({"name": "KUBEFLOW_BASE_IMAGE", "value": selected_image}) continues to receive the pinned value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 41-46: The test function test_garak_remote_run_and_retrieve_eval
(and the other test at lines 140-146) declares minio_pod and
minio_data_connection as parameters but only relies on their fixture side
effects; remove those two parameters from the function signature and add a
pytest.mark.usefixtures("minio_pod", "minio_data_connection") decorator above
the test function (and the other affected test) so Ruff no longer reports
unused-argument warnings while preserving the fixtures' side effects.
---
Duplicate comments:
In `@tests/llama_stack/conftest.py`:
- Around line 274-279: The KUBEFLOW_LLAMA_STACK_URL env var default uses literal
"rh-dev" which can point to the wrong Service; change the logic that sets
distribution_name = params.get("distribution_name", "rh-dev") to use the
module-level distribution_name (the one used when creating the admin
distribution) as the default or require params to provide distribution_name so
the URL built for KUBEFLOW_LLAMA_STACK_URL (the dict with "name" and "value")
will reference the actual Service (use the same distribution_name symbol used
elsewhere in the module and the model_namespace.name to construct the
http://{distribution_name}-service... URL).
---
Nitpick comments:
In `@tests/llama_stack/conftest.py`:
- Around line 285-293: The Garak base image is using the mutable tag
default_garak_image = "quay.io/trustyai/garak-remote-provider:latest", which can
cause flaky CI; update the logic that sets selected_image (using
params.get("kubeflow_base_image") and enable_garak_remote) to use a pinned
digest or configurable constant/env var instead of :latest (e.g., replace
default_garak_image with a digest-pinned string or read from an
environment/config constant) and ensure env_vars.append({"name":
"KUBEFLOW_BASE_IMAGE", "value": selected_image}) continues to receive the pinned
value.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (3)
tests/llama_stack/conftest.pytests/llama_stack/constants.pytests/llama_stack/eval/test_garak_provider.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/llama_stack/constants.py
| def test_garak_remote_run_and_retrieve_eval(self, minio_pod, minio_data_connection, llama_stack_client): | ||
| """ | ||
| Given a remote Garak provider with Kubeflow enabled. | ||
| When a quick Garak benchmark is registered and executed. | ||
| Then the eval job completes and results are retrievable. | ||
| """ |
There was a problem hiding this comment.
Use @pytest.mark.usefixtures to avoid unused‑argument lint warnings.
Ruff flags minio_pod/minio_data_connection as unused. If they’re only needed for fixture side effects, move them to usefixtures and remove from the signature.
🛠️ Suggested fix
@@
class TestLlamaStackGarakRemoteProvider:
@@
- def test_garak_remote_run_and_retrieve_eval(self, minio_pod, minio_data_connection, llama_stack_client):
+ `@pytest.mark.usefixtures`("minio_pod", "minio_data_connection")
+ def test_garak_remote_run_and_retrieve_eval(self, llama_stack_client):
@@
class TestLlamaStackGarakRemoteShieldScan:
@@
- def test_garak_remote_run_with_shield_scan(
- self,
- current_client_token,
- minio_pod,
- minio_data_connection,
- llama_stack_client,
- ):
+ `@pytest.mark.usefixtures`("minio_pod", "minio_data_connection")
+ def test_garak_remote_run_with_shield_scan(
+ self,
+ current_client_token,
+ llama_stack_client,
+ ):Also applies to: 140-146
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 41-41: Unused method argument: minio_pod
(ARG002)
[warning] 41-41: Unused method argument: minio_data_connection
(ARG002)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/llama_stack/eval/test_garak_provider.py` around lines 41 - 46, The test
function test_garak_remote_run_and_retrieve_eval (and the other test at lines
140-146) declares minio_pod and minio_data_connection as parameters but only
relies on their fixture side effects; remove those two parameters from the
function signature and add a pytest.mark.usefixtures("minio_pod",
"minio_data_connection") decorator above the test function (and the other
affected test) so Ruff no longer reports unused-argument warnings while
preserving the fixtures' side effects.
modified: ../conftest.py
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
tests/llama_stack/eval/test_garak_provider.py (1)
45-47:⚠️ Potential issue | 🟡 MinorFragile
response[0]assertion — not guaranteed to be the newly registered benchmark.If any benchmarks are pre-registered,
response[0]may not beGARAK_REMOTE_BENCHMARK_ID. Filter by identifier instead:🛡️ Proposed fix
response = llama_stack_client.alpha.benchmarks.list() - assert response[0].identifier == GARAK_REMOTE_BENCHMARK_ID - assert response[0].provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE + registered = next((b for b in response if b.identifier == GARAK_REMOTE_BENCHMARK_ID), None) + assert registered is not None, f"Benchmark {GARAK_REMOTE_BENCHMARK_ID} not found" + assert registered.provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/eval/test_garak_provider.py` around lines 45 - 47, The test currently assumes response[0] is the newly registered benchmark which is fragile; update the assertion to locate the benchmark by identifier instead: call llama_stack_client.alpha.benchmarks.list(), find the item where item.identifier == GARAK_REMOTE_BENCHMARK_ID (or filter the list), then assert that that item's provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE; ensure the test fails with a clear message if no matching benchmark is found.tests/llama_stack/conftest.py (1)
272-277:⚠️ Potential issue | 🟠 Major
distribution_namemismatch persists betweenllama_stack_server_configandllama_stack_distributionfixtures.Line 273 defaults
distribution_nameto"rh-dev"when not in params, butllama_stack_distribution(line 447) generates a random name. TheKUBEFLOW_LLAMA_STACK_URLwill point torh-dev-servicewhile the actual service will bellama-stack-distribution-{random}-service. Kubeflow pipelines will fail to reach the LlamaStack service.Tests must explicitly pass
distribution_namein params matching the value used byllama_stack_distribution, or the fixture architecture needs refactoring to share the generated name.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/llama_stack/conftest.py` around lines 272 - 277, The KUBEFLOW_LLAMA_STACK_URL in the llama_stack_server_config fixture uses a default distribution_name "rh-dev" that can diverge from the random name produced by the llama_stack_distribution fixture; update llama_stack_server_config so it derives distribution_name from the actual llama_stack_distribution fixture (e.g., use the generated name from the llama_stack_distribution fixture or ensure params["distribution_name"] is set to that generated name) and then build KUBEFLOW_LLAMA_STACK_URL using that shared value; reference distribution_name, llama_stack_server_config, llama_stack_distribution, and KUBEFLOW_LLAMA_STACK_URL when locating and changing the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 63-67: The test calls wait_for_eval_job_completion but never
asserts the job outcome; update the test to fetch the finished evaluation via
the client (e.g., use llama_stack_client.get_eval_job or
llama_stack_client.get_eval_results for the job_id returned by job.job_id) and
assert the job status is successful and that results contain expected fields
(presence of per-metric scores and an overall score) and non-empty values;
reference the existing wait_for_eval_job_completion and job.job_id to locate
where to add these assertions and fail the test if status != "COMPLETED" or
scores are missing.
- Around line 10-25: The parametrized test enabling enable_garak_remote (which
sets KUBEFLOW_LLAMA_STACK_URL using the default "rh-dev" distribution) will fail
because llama_stack_distribution creates a CRD with a random name; fix by adding
a deterministic "distribution_name": "llama-stack-distribution-garak-test" to
the pytest.param for
model_namespace/minio_pod/minio_data_connection/llama_stack_server_config so the
Kubeflow pipeline callback URL matches the created CRD, and ensure the
llama_stack_distribution fixture reads and uses that distribution_name parameter
when creating the CRD.
---
Duplicate comments:
In `@tests/llama_stack/conftest.py`:
- Around line 272-277: The KUBEFLOW_LLAMA_STACK_URL in the
llama_stack_server_config fixture uses a default distribution_name "rh-dev" that
can diverge from the random name produced by the llama_stack_distribution
fixture; update llama_stack_server_config so it derives distribution_name from
the actual llama_stack_distribution fixture (e.g., use the generated name from
the llama_stack_distribution fixture or ensure params["distribution_name"] is
set to that generated name) and then build KUBEFLOW_LLAMA_STACK_URL using that
shared value; reference distribution_name, llama_stack_server_config,
llama_stack_distribution, and KUBEFLOW_LLAMA_STACK_URL when locating and
changing the code.
In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 45-47: The test currently assumes response[0] is the newly
registered benchmark which is fragile; update the assertion to locate the
benchmark by identifier instead: call
llama_stack_client.alpha.benchmarks.list(), find the item where item.identifier
== GARAK_REMOTE_BENCHMARK_ID (or filter the list), then assert that that item's
provider_id == LlamaStackProviders.Eval.TRUSTYAI_GARAK_REMOTE; ensure the test
fails with a clear message if no matching benchmark is found.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: bbfc6a1a-92f5-4964-b102-98e8cf58d0f2
📒 Files selected for processing (3)
tests/llama_stack/conftest.pytests/llama_stack/constants.pytests/llama_stack/eval/test_garak_provider.py
| @pytest.mark.parametrize( | ||
| "model_namespace, minio_pod, minio_data_connection, llama_stack_server_config", | ||
| [ | ||
| pytest.param( | ||
| {"name": "test-garak-remote-security"}, | ||
| MinIo.PodConfig.QWEN_HAP_BPIV2_MINIO_CONFIG, | ||
| {"bucket": "llms"}, | ||
| { | ||
| "vllm_url_fixture": "qwen_isvc_url", | ||
| "inference_model": QWEN_MODEL_NAME, | ||
| "enable_garak_remote": True, # Injects ENABLE_KUBEFLOW_GARAK=true | ||
| }, | ||
| ) | ||
| ], | ||
| indirect=True, | ||
| ) |
There was a problem hiding this comment.
Missing distribution_name in parametrization will cause URL mismatch at runtime.
The test enables enable_garak_remote: True (line 20) which injects KUBEFLOW_LLAMA_STACK_URL with the default "rh-dev" distribution name. However, llama_stack_distribution fixture creates the CRD with a random name. The Kubeflow pipeline callback URL will be unreachable.
Add "distribution_name": "llama-stack-distribution-garak-test" to the params and ensure llama_stack_distribution uses this name, or refactor the fixture architecture.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/llama_stack/eval/test_garak_provider.py` around lines 10 - 25, The
parametrized test enabling enable_garak_remote (which sets
KUBEFLOW_LLAMA_STACK_URL using the default "rh-dev" distribution) will fail
because llama_stack_distribution creates a CRD with a random name; fix by adding
a deterministic "distribution_name": "llama-stack-distribution-garak-test" to
the pytest.param for
model_namespace/minio_pod/minio_data_connection/llama_stack_server_config so the
Kubeflow pipeline callback URL matches the created CRD, and ensure the
llama_stack_distribution fixture reads and uses that distribution_name parameter
when creating the CRD.
Implement comprehensive integration tests for the remote mode of the llama_stack_garak_provider across three tiers: - smoke (TestGarakRemoteQuickScan): predefined quick benchmark registration, eval job submission, status polling, and result retrieval - tier1 (TestGarakRemoteCustomBenchmark): custom benchmark with explicit garak_config metadata, probe selection, and result validation - tier2 (TestGarakRemoteShieldScan): shield registration with FMS guardrails orchestrator, benchmark with shield_ids, and shielded eval execution Key changes: - Support distribution_image override in llama_stack_server_config fixture to use specific LlamaStack 0.5.x images - Pre-generate CR names for consistent LlamaStack service URL construction - Add deployment namespace to NetworkPolicy allowedFrom for KFP pod access - Add guardrails orchestrator service URL fixture for in-cluster communication - Use provider-qualified model IDs (vllm-inference/<model>) for LlamaStack 0.5.x - Add eval job utilities with enhanced status logging and result validation Made-with: Cursor
The guardrails_orchestrator_ssl_cert, guardrails_orchestrator_ssl_cert_secret, and patched_llamastack_deployment_tls_certs fixtures are no longer needed since the shield tests now use verify_ssl=False with the HTTPS route. This resolves the unused-code CI failure. Made-with: Cursor
feat(garak): Add integration tests for Garak remote provider
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/conftest.py`:
- Around line 258-260: The code computes enable_kubeflow_eval from
enable_ragas_remote/enable_garak_remote but still uses the old
enable_kubeflow_ragas flag when determining _cr_name/cr_name, which can leave
cr_name unassigned; update the logic so the Ragas-related flags are normalized
before any use (e.g. set enable_kubeflow_ragas =
params.get("enable_kubeflow_ragas", enable_ragas_remote) or derive _cr_name
using enable_ragas_remote) and ensure _cr_name/cr_name assignment in the same
branching that now uses enable_ragas_remote/enable_garak_remote so the
distribution fixture later (lines ~361-362) always sees a defined cr_name.
In `@tests/llama_stack/eval/conftest.py`:
- Around line 252-266: The code selects an orchestrator service and always
returns a URL prefixed with "http://" even when the chosen port is 443; update
the return logic in the TimeoutSampler block that handles the result of
_find_orchestrator_service (and the _HTTP_PORTS tuple) to choose the scheme
based on http_port (use "https://" when http_port == 443, "http://" otherwise)
and return the appropriately-prefixed URL for svc.name and ns; keep the
service/port discovery in _find_orchestrator_service unchanged, only change how
the final URL string is constructed before returning.
In `@tests/llama_stack/eval/test_garak_provider.py`:
- Around line 324-329: The test disables TLS verification for the guardrails
orchestrator; update the fixture to use the in-cluster service and re-enable
verification: change the fms_orchestrator_url_fixture value from the external
HTTPS fixture (currently "guardrails_orchestrator_url") to the in-cluster
fixture (e.g. "guardrails_orchestrator_incluster_url" or the new in-cluster
fixture name available in tests) and remove or set any verify_ssl=False flag to
True so TLS verification is enforced; apply the same change to the other
occurrence mentioned (the block around lines 362-368) and keep references to
vllm_url_fixture and enable_garak_remote unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 922ca93e-ebb9-4616-bb5e-ed91393928a4
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
pyproject.tomltests/llama_stack/conftest.pytests/llama_stack/eval/conftest.pytests/llama_stack/eval/constants.pytests/llama_stack/eval/test_garak_provider.pytests/llama_stack/eval/utils.pytests/llama_stack/utils.py
✅ Files skipped from review due to trivial changes (3)
- pyproject.toml
- tests/llama_stack/utils.py
- tests/llama_stack/eval/constants.py
| enable_ragas_remote = params.get("enable_ragas_remote", False) | ||
| enable_garak_remote = params.get("enable_garak_remote", False) | ||
| enable_kubeflow_eval = enable_ragas_remote or enable_garak_remote |
There was a problem hiding this comment.
Normalize the Ragas flags before using cr_name.
This now computes Kubeflow setup from enable_ragas_remote, but _cr_name is still keyed off enable_kubeflow_ragas. With the new flag, Ragas gets a callback URL for cr_name but the distribution fixture later invents a different CR name; with the old flag, Line 362 can raise UnboundLocalError because cr_name was never assigned.
Proposed fix
- enable_ragas_remote = params.get("enable_ragas_remote", False)
+ enable_ragas_remote = bool(
+ params.get("enable_ragas_remote", params.get("enable_kubeflow_ragas", False))
+ )
enable_garak_remote = params.get("enable_garak_remote", False)
enable_kubeflow_eval = enable_ragas_remote or enable_garak_remote
@@
- if enable_garak_remote or params.get("enable_kubeflow_ragas"):
+ if enable_kubeflow_eval:
server_config["_cr_name"] = cr_nameAlso applies to: 361-362
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/llama_stack/conftest.py` around lines 258 - 260, The code computes
enable_kubeflow_eval from enable_ragas_remote/enable_garak_remote but still uses
the old enable_kubeflow_ragas flag when determining _cr_name/cr_name, which can
leave cr_name unassigned; update the logic so the Ragas-related flags are
normalized before any use (e.g. set enable_kubeflow_ragas =
params.get("enable_kubeflow_ragas", enable_ragas_remote) or derive _cr_name
using enable_ragas_remote) and ensure _cr_name/cr_name assignment in the same
branching that now uses enable_ragas_remote/enable_garak_remote so the
distribution fixture later (lines ~361-362) always sees a defined cr_name.
| _HTTP_PORTS = (8033, 8034, 80, 443) | ||
|
|
||
| def _find_orchestrator_service() -> tuple[Service, int] | None: | ||
| for svc in Service.get(client=admin_client, namespace=ns): | ||
| svc_name = svc.name | ||
| if svc_name == orch_name or svc_name.startswith(f"{orch_name}-"): | ||
| for p in svc.instance.spec.ports: | ||
| if p.port in _HTTP_PORTS: | ||
| return svc, p.port | ||
| return None | ||
|
|
||
| for result in TimeoutSampler(wait_timeout=120, sleep=5, func=_find_orchestrator_service): | ||
| if result is not None: | ||
| svc, http_port = result | ||
| return f"http://{svc.name}.{ns}.svc.cluster.local:{http_port}" |
There was a problem hiding this comment.
Don’t return http:// for port 443.
Line 252 treats 443 as a valid candidate, but Line 266 always formats an http:// URL. If the orchestrator service exposes only 443, callers will speak plain HTTP to a TLS port and fail.
Proposed fix
- _HTTP_PORTS = (8033, 8034, 80, 443)
+ _HTTP_PORTS = {8033, 8034, 80}
+ _HTTPS_PORTS = {443}
- def _find_orchestrator_service() -> tuple[Service, int] | None:
+ def _find_orchestrator_service() -> tuple[str, Service, int] | None:
for svc in Service.get(client=admin_client, namespace=ns):
svc_name = svc.name
if svc_name == orch_name or svc_name.startswith(f"{orch_name}-"):
for p in svc.instance.spec.ports:
if p.port in _HTTP_PORTS:
- return svc, p.port
+ return "http", svc, p.port
+ if p.port in _HTTPS_PORTS:
+ return "https", svc, p.port
return None
for result in TimeoutSampler(wait_timeout=120, sleep=5, func=_find_orchestrator_service):
if result is not None:
- svc, http_port = result
- return f"http://{svc.name}.{ns}.svc.cluster.local:{http_port}"
+ scheme, svc, port = result
+ return f"{scheme}://{svc.name}.{ns}.svc.cluster.local:{port}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/llama_stack/eval/conftest.py` around lines 252 - 266, The code selects
an orchestrator service and always returns a URL prefixed with "http://" even
when the chosen port is 443; update the return logic in the TimeoutSampler block
that handles the result of _find_orchestrator_service (and the _HTTP_PORTS
tuple) to choose the scheme based on http_port (use "https://" when http_port ==
443, "http://" otherwise) and return the appropriately-prefixed URL for svc.name
and ns; keep the service/port discovery in _find_orchestrator_service unchanged,
only change how the final URL string is constructed before returning.
| { | ||
| "vllm_url_fixture": "qwen_isvc_service_url", | ||
| "inference_model": QWEN_MODEL_NAME, | ||
| "fms_orchestrator_url_fixture": "guardrails_orchestrator_url", | ||
| "enable_garak_remote": True, | ||
| "distribution_image": LLAMA_STACK_DISTRIBUTION_IMAGE, |
There was a problem hiding this comment.
Stop disabling TLS verification for the guardrails path (CWE-295).
This scenario still points FMS_ORCHESTRATOR_URL at the HTTPS route and then sets verify_ssl=False. That lets any in-cluster MITM or poisoned route cert impersonate the orchestrator and makes the test miss trust-chain regressions. Use the new in-cluster service fixture here and keep verification enabled.
Proposed fix
- "fms_orchestrator_url_fixture": "guardrails_orchestrator_url",
+ "fms_orchestrator_url_fixture": "guardrails_orchestrator_service_url",
@@
- "verify_ssl": False,
+ "verify_ssl": True,As per coding guidelines, "REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code)".
Also applies to: 362-368
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/llama_stack/eval/test_garak_provider.py` around lines 324 - 329, The
test disables TLS verification for the guardrails orchestrator; update the
fixture to use the in-cluster service and re-enable verification: change the
fms_orchestrator_url_fixture value from the external HTTPS fixture (currently
"guardrails_orchestrator_url") to the in-cluster fixture (e.g.
"guardrails_orchestrator_incluster_url" or the new in-cluster fixture name
available in tests) and remove or set any verify_ssl=False flag to True so TLS
verification is enforced; apply the same change to the other occurrence
mentioned (the block around lines 362-368) and keep references to
vllm_url_fixture and enable_garak_remote unchanged.
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit
New Features
Tests