Support llama stack tests on disconnected using https_proxy#1336
Support llama stack tests on disconnected using https_proxy#1336jgarciao wants to merge 10 commits intoopendatahub-io:mainfrom
Conversation
|
The following are automatically added/executed:
Available user actions:
Supported labels{'/build-push-pr-image', '/hold', '/wip', '/cherry-pick', '/lgtm', '/verified'} |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdjusts Llama Stack test harness: adds HTTPS proxy and client-verify constants, injects disconnected-cluster and TLS/proxy settings into server config, increases client/polling timeouts, tweaks CPU defaults, skips storage on air-gapped clusters, removes a FAISS-specific env var, and normalizes vector-store helper parameters. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Actionable Items
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ca8b9ca to
03f426f
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/conftest.py`:
- Around line 290-295: The fixture currently removes server_config["storage"]
when params.get("llama_stack_storage_size") is set on disconnected clusters;
instead, preserve the requested storage config and skip the test on disconnected
runs: when params.get("llama_stack_storage_size") is truthy and
IS_DISCONNECTED_CLUSTER is true, call pytest.skip with a clear message (e.g.,
referencing RHAIENG-1819) rather than mutating server_config; ensure pytest is
imported in tests/llama_stack/conftest.py and leave the
storage_size/server_config["storage"] assignment intact for connected runs.
- Around line 250-252: The current code logs and injects the raw HTTPS_PROXY
string into env_vars which may contain credentials; stop emitting the proxy
value and move the proxy into a Kubernetes Secret reference: replace the inline
env_vars.append({"name": "HTTPS_PROXY", "value": HTTPS_PROXY}) with an env var
entry that uses a secretKeyRef (e.g.,
{"name":"HTTPS_PROXY","valueFrom":{"secretKeyRef":{"name":<llama_stack_secret_name>,"key":"https-proxy"}}}),
change the LOGGER.info call to a generic "HTTPS proxy enabled" message (do not
interpolate HTTPS_PROXY), and add "https-proxy" to
LLAMA_STACK_DISTRIBUTION_SECRET_DATA so the secret contains the proxy string for
reference by the secretKeyRef; update any code that builds server_config to
reference the secretKeyRef instead of embedding the raw proxy.
- Around line 646-652: The readiness setup uses httpx.Client(verify=verifySSL,
timeout=300) which makes all requests (including the
health/version/models/vector_stores/files checks performed by
wait_for_llama_stack_client_ready) honor a 300s timeout and can exhaust the
retry decorator's 90s wait_timeout; change the default httpx.Client timeout to a
short value (e.g., 10s) used for readiness checks and reserve the long 300s
timeout only for operations that need it by configuring LlamaStackClient or
per-request timeouts (adjust where http_client is created and where
LlamaStackClient is instantiated so readiness calls use the shorter timeout
while file/vector-store calls use 300s).
In `@tests/llama_stack/constants.py`:
- Line 32: The new constant HTTPS_PROXY lacks a type annotation; update the
declaration in tests/llama_stack/constants.py to include an explicit type so
mypy stays happy, for example change the line with HTTPS_PROXY to use a str
annotation (HTTPS_PROXY: str = os.getenv("SQUID_HTTPS_PROXY", "") or otherwise
ensure the result is typed as str) referencing the HTTPS_PROXY symbol.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 598d6eef-fc16-4491-98e7-b900eeebc37f
📒 Files selected for processing (4)
tests/fixtures/vector_io.pytests/llama_stack/conftest.pytests/llama_stack/constants.pytests/llama_stack/utils.py
💤 Files with no reviewable changes (1)
- tests/fixtures/vector_io.py
tests/llama_stack/conftest.py
Outdated
| env_vars_vector_io = vector_io_provider_deployment_config_factory(provider_name=vector_io_provider) | ||
| env_vars.extend(env_vars_vector_io) | ||
|
|
||
| if IS_DISCONNECTED_CLUSTER and HTTPS_PROXY: |
There was a problem hiding this comment.
@jgarciao for disconnected clusters, we have utility is_disconnected_cluster in utilities/infra.py. Can that be used instead of one extra environment variable?
There was a problem hiding this comment.
I was thinking on modifying the is_disconnected_cluster fixture to use IS_DISCONNECTED_CLUSTER once the new env var is set by Jenkins. It will work faster and it will probably be more reliable than the current implementation
There was a problem hiding this comment.
I don't think it is a good idea. Specially since we can learn from the cluster whether it is connected or not. It also works for local runs etc.
There was a problem hiding this comment.
In latest commit I've removed IS_DISCONNECTED_CLUSTER and now it is using is_disconnected_cluster.
Note: I had to define a is_disconnected_cluster fixture because llama_stack_server_config doesn't have a client parameter (it needs to support privileged and unprivileged clients)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/constants.py`:
- Around line 34-35: LLS_CLIENT_VERIFY_SSL currently defaults to false which
disables SSL verification for all tests; change it to default to True
(LLS_CLIENT_VERIFY_SSL: bool) and only set it to False when explicitly opted out
or when IS_DISCONNECTED_CLUSTER is true—i.e., read env var if present but
otherwise set True, and if you want cluster-based behavior, compute
LLS_CLIENT_VERIFY_SSL = False only when IS_DISCONNECTED_CLUSTER is True and no
explicit env override; update the constant name usage accordingly so
httpx.Client instantiation reads this boolean.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 5bf58c99-eb59-4dd6-a6ae-bc687e86eeca
📒 Files selected for processing (2)
tests/llama_stack/conftest.pytests/llama_stack/constants.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/llama_stack/conftest.py
52f5bca to
ae05e08
Compare
Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Increase timeout as workaround for known bug RHAIENG-1819 affecting the test execution Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Needed to avoid problems when using sentence-transformers Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
As tlsconfig is used on disconnected, there is test inestability due of the usage of tlsconfig. Adding a workaround for it Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com>
9c8290f to
9bb2a64
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/llama_stack/conftest.py`:
- Around line 258-260: When is_disconnected_cluster is true and HTTPS_PROXY is
added to env_vars, also append a NO_PROXY entry to exempt local/internal hosts
so embedded/internal HTTPS endpoints (LLS_CORE_VLLM_EMBEDDING_URL,
LLS_CORE_VLLM_URL) are bypassed by the proxy; update the block that logs and
appends HTTPS_PROXY (referencing LOGGER, HTTPS_PROXY, and env_vars) to also
append {"name": "NO_PROXY", "value": "localhost,127.0.0.1"} (or configurable
exempt hosts) and include this in the log message.
- Line 20: The test module imports the package but not its infra submodule,
causing AttributeError when the llama_stack_server_config fixture calls
utilities.infra.is_disconnected_cluster; explicitly import the submodule instead
of the bare package (e.g., replace the top-level "import utilities" with an
explicit "from utilities import infra" or "import utilities.infra as infra") and
update the fixture references (llama_stack_server_config and any direct calls to
utilities.infra) to use the imported infra symbol so utilities.infra is
available at evaluation time.
In `@tests/llama_stack/utils.py`:
- Around line 105-110: Replace the hard-coded timeout=240 passed to
llama_stack_client.vector_stores.files.create with a value derived from the
function's wait_timeout budget: record a start time (time.monotonic()), compute
request_timeout = max(1, int(wait_timeout - (time.monotonic() - start))) before
calling vector_stores.files.create, and pass timeout=request_timeout (instead of
240); ensure you handle request_timeout <= 0 by raising or setting a minimal
timeout so the subsequent polling/deadline logic still enforces the overall
wait_timeout contract for the operation involving vector_store_id, file_id and
attributes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 8a233da3-eaa3-4957-b7c2-27d2e62dcf83
📒 Files selected for processing (4)
tests/fixtures/vector_io.pytests/llama_stack/conftest.pytests/llama_stack/constants.pytests/llama_stack/utils.py
💤 Files with no reviewable changes (1)
- tests/fixtures/vector_io.py
✅ Files skipped from review due to trivial changes (1)
- tests/llama_stack/constants.py
| from ocp_resources.service import Service | ||
| from semver import Version | ||
|
|
||
| import utilities |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Checking whether the utilities package exposes the infra submodule:"
fd -a '__init__.py' utilities | while read -r file; do
echo "== $file =="
rg -nC2 '\binfra\b|__getattr__|__all__' "$file" || true
doneRepository: opendatahub-io/opendatahub-tests
Length of output: 367
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Contents of utilities/__init__.py ==="
cat -n utilities/__init__.py
echo ""
echo "=== Check for utilities/infra module/package ==="
fd -a 'infra' utilities
echo ""
echo "=== Actual usage in conftest.py around line 20 and 76-79 ==="
sed -n '15,85p' tests/llama_stack/conftest.py | cat -nRepository: opendatahub-io/opendatahub-tests
Length of output: 3119
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Full import section of conftest.py ==="
head -n 50 tests/llama_stack/conftest.py | cat -n
echo ""
echo "=== Search for 'import utilities.infra' or 'from utilities import infra' ==="
rg 'import utilities\.infra|from utilities import infra' tests/llama_stack/conftest.py
echo ""
echo "=== Check if is_disconnected_cluster fixture is actually used ==="
rg 'is_disconnected_cluster' tests/llama_stack/conftest.pyRepository: opendatahub-io/opendatahub-tests
Length of output: 2309
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Search for uses of is_disconnected_cluster fixture ==="
rg 'is_disconnected_cluster' tests/llama_stack/ -l
echo ""
echo "=== Count usages ==="
rg 'is_disconnected_cluster' tests/llama_stack/ | wc -lRepository: opendatahub-io/opendatahub-tests
Length of output: 188
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Search for test functions using is_disconnected_cluster as parameter ==="
rg 'def test.*is_disconnected_cluster' tests/llama_stack/
echo ""
echo "=== Show all lines with is_disconnected_cluster in conftest.py ==="
rg -n 'is_disconnected_cluster' tests/llama_stack/conftest.pyRepository: opendatahub-io/opendatahub-tests
Length of output: 152
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== All lines with is_disconnected_cluster in conftest.py with context ==="
rg -n 'is_disconnected_cluster' tests/llama_stack/conftest.py -B2 -A2Repository: opendatahub-io/opendatahub-tests
Length of output: 1824
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check if utilities/infra.py has is_disconnected_cluster function ==="
rg -n 'def is_disconnected_cluster' utilities/infra.pyRepository: opendatahub-io/opendatahub-tests
Length of output: 212
Fix broken utilities.infra import causing AttributeError at fixture evaluation.
Line 79 accesses utilities.infra.is_disconnected_cluster(...) but import utilities (line 20) does not import the infra submodule—utilities/__init__.py is empty. This causes AttributeError: module 'utilities' has no attribute 'infra' when the fixture is evaluated by llama_stack_server_config (line 89).
Required fix
-import utilities
+from utilities.infra import is_disconnected_cluster as detect_disconnected_cluster
@@
`@pytest.fixture`(scope="class")
def is_disconnected_cluster(admin_client: DynamicClient) -> bool:
"""Whether the target cluster is disconnected (air-gapped)."""
- return utilities.infra.is_disconnected_cluster(client=admin_client)
+ return detect_disconnected_cluster(client=admin_client)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/llama_stack/conftest.py` at line 20, The test module imports the
package but not its infra submodule, causing AttributeError when the
llama_stack_server_config fixture calls utilities.infra.is_disconnected_cluster;
explicitly import the submodule instead of the bare package (e.g., replace the
top-level "import utilities" with an explicit "from utilities import infra" or
"import utilities.infra as infra") and update the fixture references
(llama_stack_server_config and any direct calls to utilities.infra) to use the
imported infra symbol so utilities.infra is available at evaluation time.
| if is_disconnected_cluster and HTTPS_PROXY: | ||
| LOGGER.info(f"Setting proxy and tlsconfig configuration (https_proxy:{HTTPS_PROXY})") | ||
| env_vars.append({"name": "HTTPS_PROXY", "value": HTTPS_PROXY}) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -nC2 '"HTTPS_PROXY"|"NO_PROXY"|"https_proxy"|"no_proxy"|LLS_CORE_VLLM_URL|LLS_CORE_VLLM_EMBEDDING_URL' tests/llama_stackRepository: opendatahub-io/opendatahub-tests
Length of output: 3832
🏁 Script executed:
#!/bin/bash
# Get the full context around lines 258-260 in conftest.py
head -n 280 tests/llama_stack/conftest.py | tail -n 40Repository: opendatahub-io/opendatahub-tests
Length of output: 2080
🏁 Script executed:
#!/bin/bash
# Search for NO_PROXY/no_proxy in conftest.py
rg -n 'NO_PROXY|no_proxy' tests/llama_stack/conftest.pyRepository: opendatahub-io/opendatahub-tests
Length of output: 58
🏁 Script executed:
#!/bin/bash
# Check what HTTPS_PROXY constant value is
rg -n 'HTTPS_PROXY\s*=' tests/llama_stack/conftest.pyRepository: opendatahub-io/opendatahub-tests
Length of output: 58
🏁 Script executed:
#!/bin/bash
# Check imports in conftest.py to find HTTPS_PROXY
head -n 50 tests/llama_stack/conftest.py | grep -n HTTPS_PROXYRepository: opendatahub-io/opendatahub-tests
Length of output: 95
🏁 Script executed:
#!/bin/bash
# Check all references to LLS_CORE_VLLM_URL in constants.py
grep -n "LLS_CORE_VLLM_URL" tests/llama_stack/constants.pyRepository: opendatahub-io/opendatahub-tests
Length of output: 133
🏁 Script executed:
#!/bin/bash
# Get the full constants.py to understand defaults
cat tests/llama_stack/constants.py | head -n 65Repository: opendatahub-io/opendatahub-tests
Length of output: 2579
Add NO_PROXY exemption when setting HTTPS_PROXY for disconnected clusters.
When HTTPS_PROXY is set, all outbound HTTPS requests use the proxy. LLS_CORE_VLLM_EMBEDDING_URL defaults to https://nomic-embed-text-v1-5.example.com:443/v1, and LLS_CORE_VLLM_URL may also be HTTPS. Both will be routed through the proxy without a NO_PROXY exemption, causing connectivity failures in disconnected clusters where the proxy cannot reach these internal endpoints. Set NO_PROXY to "localhost,127.0.0.1" or the appropriate exempt hosts alongside HTTPS_PROXY.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/llama_stack/conftest.py` around lines 258 - 260, When
is_disconnected_cluster is true and HTTPS_PROXY is added to env_vars, also
append a NO_PROXY entry to exempt local/internal hosts so embedded/internal
HTTPS endpoints (LLS_CORE_VLLM_EMBEDDING_URL, LLS_CORE_VLLM_URL) are bypassed by
the proxy; update the block that logs and appends HTTPS_PROXY (referencing
LOGGER, HTTPS_PROXY, and env_vars) to also append {"name": "NO_PROXY", "value":
"localhost,127.0.0.1"} (or configurable exempt hosts) and include this in the
log message.
| vs_file = llama_stack_client.vector_stores.files.create( | ||
| vector_store_id=vector_store_id, file_id=file_id, attributes=attributes | ||
| vector_store_id=vector_store_id, | ||
| file_id=file_id, | ||
| timeout=240, # Increased timeout for slow processing (e.g., sentence-transformers) | ||
| attributes=dict(attributes) if attributes else attributes, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the file
find . -name "utils.py" -path "*/llama_stack/*" | head -20Repository: opendatahub-io/opendatahub-tests
Length of output: 138
🏁 Script executed:
# Read the relevant section of the file
cat -n tests/llama_stack/utils.py | sed -n '85,130p'Repository: opendatahub-io/opendatahub-tests
Length of output: 2329
🏁 Script executed:
rg -n "vector_store_create_and_poll" tests/Repository: opendatahub-io/opendatahub-tests
Length of output: 214
🏁 Script executed:
# Also check the function signature to see default wait_timeout
cat -n tests/llama_stack/utils.py | sed -n '75,98p'Repository: opendatahub-io/opendatahub-tests
Length of output: 1187
🏁 Script executed:
cat -n tests/llama_stack/utils.py | sed -n '290,310p'Repository: opendatahub-io/opendatahub-tests
Length of output: 1300
🏁 Script executed:
# Also check if there are any other uses or integration points
rg -B 5 -A 5 "vector_store_create_and_poll\(" tests/llama_stack/utils.py | head -40Repository: opendatahub-io/opendatahub-tests
Length of output: 1049
🏁 Script executed:
# Check if there are other callers or integration tests that rely on the wait_timeout contract
rg -B 3 -A 3 "wait_timeout" tests/llama_stack/utils.py | head -50Repository: opendatahub-io/opendatahub-tests
Length of output: 1799
🏁 Script executed:
# Verify the actual execution path and timing behavior
cat -n tests/llama_stack/utils.py | sed -n '112,122p'Repository: opendatahub-io/opendatahub-tests
Length of output: 719
Request timeout must respect wait_timeout contract.
The hard-coded timeout=240 in the create() call (line 108) is independent of wait_timeout, causing total execution to potentially exceed the documented budget. The deadline is set after create completes (line 112), allowing the function to run up to 240s + wait_timeout. With default wait_timeout=240, a caller gets up to 480s instead of the promised 240s.
Tie the request timeout to the polling timeout so total execution respects the wait_timeout contract.
Proposed fix
def vector_store_create_and_poll(
llama_stack_client: LlamaStackClient,
vector_store_id: str,
file_id: str,
*,
attributes: dict[str, str | int | float | bool] | None = None,
poll_interval_sec: float = 5.0,
wait_timeout: float = 240.0,
+ request_timeout: float | None = None,
) -> VectorStoreFile:
+ create_timeout = wait_timeout if request_timeout is None else min(request_timeout, wait_timeout)
vs_file = llama_stack_client.vector_stores.files.create(
vector_store_id=vector_store_id,
file_id=file_id,
- timeout=240, # Increased timeout for slow processing (e.g., sentence-transformers)
+ timeout=create_timeout,
attributes=dict(attributes) if attributes else attributes,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| vs_file = llama_stack_client.vector_stores.files.create( | |
| vector_store_id=vector_store_id, file_id=file_id, attributes=attributes | |
| vector_store_id=vector_store_id, | |
| file_id=file_id, | |
| timeout=240, # Increased timeout for slow processing (e.g., sentence-transformers) | |
| attributes=dict(attributes) if attributes else attributes, | |
| ) | |
| def vector_store_create_and_poll( | |
| llama_stack_client: LlamaStackClient, | |
| vector_store_id: str, | |
| file_id: str, | |
| *, | |
| attributes: dict[str, str | int | float | bool] | None = None, | |
| poll_interval_sec: float = 5.0, | |
| wait_timeout: float = 240.0, | |
| request_timeout: float | None = None, | |
| ) -> VectorStoreFile: | |
| create_timeout = wait_timeout if request_timeout is None else min(request_timeout, wait_timeout) | |
| vs_file = llama_stack_client.vector_stores.files.create( | |
| vector_store_id=vector_store_id, | |
| file_id=file_id, | |
| timeout=create_timeout, | |
| attributes=dict(attributes) if attributes else attributes, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/llama_stack/utils.py` around lines 105 - 110, Replace the hard-coded
timeout=240 passed to llama_stack_client.vector_stores.files.create with a value
derived from the function's wait_timeout budget: record a start time
(time.monotonic()), compute request_timeout = max(1, int(wait_timeout -
(time.monotonic() - start))) before calling vector_stores.files.create, and pass
timeout=request_timeout (instead of 240); ensure you handle request_timeout <= 0
by raising or setting a minimal timeout so the subsequent polling/deadline logic
still enforces the overall wait_timeout contract for the operation involving
vector_store_id, file_id and attributes.
If defined, use SQUID_HTTPS_PROXY env var to be able to run the llama-stack and RAG tests on disconnected clusters using a https_proxy to connect to the external LLMs
The env var will be set by Jenkins when running on disconnected clusters. The PRs adding them are currently in review, but this PR can be merged as it will work on connected clusters if the env vars is not defined
Summary by CodeRabbit