Fix and deprecate agents test. Increase timeouts. Add workaround for RHAIENG-1819 (#840)

jgarciao · dbasunag · kpunwatk · jgarciao · commit ff0fea75725e · 2025-11-19T17:51:38.000Z
* fix: Fix agents test Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: Deprecate agents test Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * Move to pytest 9.0.0 (#823) Co-authored-by: Karishma Punwatkar <kpunwatk@redhat.com> * Update label to ensure we pick the right pod (#841) * fix ig test failing due to timeout (#844) * fix ig test failing due to timeout Signed-off-by: Milind Waykole <mwaykole@redhat.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Milind Waykole <mwaykole@redhat.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * remove empty dir which is not needed now (#846) Signed-off-by: Milind Waykole <mwaykole@redhat.com> * some refactor of code for model-server (#845) * refactor some test Signed-off-by: Milind Waykole <mwaykole@redhat.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add kueue direct Signed-off-by: Milind Waykole <mwaykole@redhat.com> --------- Signed-off-by: Milind Waykole <mwaykole@redhat.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * migrate tier2 test to ovms (#847) Signed-off-by: Milind Waykole <mwaykole@redhat.com> * Refactor model_server tests: organize KServe tests and migrate to OVMS (#848) - Move KServe-specific tests into tests/model_serving/model_server/kserve/ directory: * authentication/ * components/ * inference_graph/ * inference_service_configuration/ * keda/ * kueue/ * metrics/ * model_car/ * multi_node/ * private_endpoint/ * raw_deployment/ * routes/ * stop_resume/ * storage/ - Keep upgrade/ at model_server level (general model server tests) - Keep llmd/ and maas_billing/ at model_server level (non-KServe components) - Migrate test_custom_resources.py from Caikit-TGIS to OVMS runtime * Update to use OVMS template and model format * Use test-dir model from ods-ci-s3 bucket * Add to sanity test suite - Fix InferenceGraph tests: * Add kserve_raw_headless_service_config fixture to set DSC rawDeploymentServiceConfig to Headed * Ensure proper fixture dependency ordering * All 6 InferenceGraph tests now passing - Update all imports to reflect new directory structure - Clean up empty directories (model_mesh, ovms, runtime_configuration) - All pre-commit checks and tests passing * Refactor model_server tests: organize KServe tests and migrate to OVMS (#850) fix signoff for previous pr Signed-off-by: Milind Waykole <mwaykole@redhat.com> * Added smoke tests markers for Runtime-tests (#852) * Hardcoded Triton Runtime Image in Triton testsuite (#853) * Hardcoded triton image * fixed ns name * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update TRITON_IMAGE version to 24.10-py3 * Changed image version * added smoke marker --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add test for issue checking isvc status RHOAIENG-38674 (#858) * Add test for issue checking isvc status RHOAIENG-38674 Signed-off-by: Milind Waykole <mwaykole@redhat.com> * fix func name Signed-off-by: Milind Waykole <mwaykole@redhat.com> --------- Signed-off-by: Milind Waykole <mwaykole@redhat.com> * Add support for byoidc envs in MR tests (#820) * feat: add support for byoidc envs Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: fix uv lock conflict Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: coderabbit review Signed-off-by: lugi0 <lgiorgi@redhat.com> * undo uv changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: address review comments Signed-off-by: lugi0 <lgiorgi@redhat.com> * fix: address coderabbit review Signed-off-by: lugi0 <lgiorgi@redhat.com> * feat: remove code that is not needed anymore Signed-off-by: lugi0 <lgiorgi@redhat.com> --------- Signed-off-by: lugi0 <lgiorgi@redhat.com> Co-authored-by: Debarati Basu-Nag <dbasunag@redhat.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Lock file maintenance (#856) Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> * Adjust automation for FAISS vector store (#819) * Adjust automation for FAISS vector store * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jorge <jgarciao@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (#859) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.4 → v0.14.5](astral-sh/ruff-pre-commit@v0.14.4...v0.14.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * llmd tests minor improvements (#831) * Increase timeout for llmd * Update utilities/llmd_constants.py Co-authored-by: Adolfo Aguirrezabal <aaguirre@redhat.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Adolfo Aguirrezabal <aaguirre@redhat.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Milind Waykole <mwaykole@redhat.com> * Fix the test which re getting skipped due to scope (#865) Signed-off-by: Milind Waykole <mwaykole@redhat.com> * Update error pattern for negative test (#864) Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> * fix: add xfail to failing catalog test (#867) Signed-off-by: lugi0 <lgiorgi@redhat.com> * Change model catakog label to pick both the catalog pods (#863) Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> * feat: add wait_for_unique_llama_stack_pod as workaround for RHAIENG-1819 Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * fix: increase timeout when creating the llama-stack deployment Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> * Refactor model_server tests: organize KServe tests and migrate to OVMS (#850) fix signoff for previous pr Signed-off-by: Milind Waykole <mwaykole@redhat.com> * feat: add tests for ragas remote provider (#866) * Refactor model_server tests: organize KServe tests and migrate to OVMS (#850) fix signoff for previous pr Signed-off-by: Milind Waykole <mwaykole@redhat.com> --------- Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> Signed-off-by: Milind Waykole <mwaykole@redhat.com> Signed-off-by: lugi0 <lgiorgi@redhat.com> Co-authored-by: Debarati Basu-Nag <dbasunag@redhat.com> Co-authored-by: Karishma Punwatkar <kpunwatk@redhat.com> Co-authored-by: Milind Waykole <mwaykole@redhat.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: RAGHUL M <ragm@redhat.com> Co-authored-by: Luca Giorgi <lgiorgi@redhat.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Jiri Petrlik <jiripetrlik@gmail.com> Co-authored-by: Thomas Recchiuto <34453570+threcc@users.noreply.github.com> Co-authored-by: Adolfo Aguirrezabal <aaguirre@redhat.com>
diff --git a/tests/llama_stack/agents/test_agents_deprecated.py b/tests/llama_stack/agents/test_agents_deprecated.py
@@ -21,13 +21,19 @@
 )
 @pytest.mark.rag
 @pytest.mark.skip_must_gather
-class TestLlamaStackAgents:
-    """Test class for LlamaStack Agents API
+class TestLlamaStackAgentsDeprecated:
+    """Test class for LlamaStack Agents API (Deprecated)
 
-    For more information about this API, see:
-    - https://llamastack.github.io/docs/building_applications/agent
-    - https://llamastack.github.io/docs/references/python_sdk_reference#agents
-    - https://llamastack.github.io/docs/building_applications/responses_vs_agents
+    Deprecation Notice: The LlamaStack Agents API was removed server-side in llama-stack 0.3.0.
+    It is partially implemented in llama-stack-client using the Responses API
+    (https://github.com/llamastack/llama-stack-client-python/pull/281).
+
+    Users are encouraged to use the Responses API directly.
+
+    For more information, see:
+    - https://llamastack.github.io/docs/api-deprecated/agents
+    - "Migrating from Agent objects to Responses in Llama Stack":
+      https://github.com/opendatahub-io/agents/blob/5902bef12c25281eecfcd3d25654de8b02857e33/migration/legacy-agents/responses-api-agent-migration.ipynb
     """
 
     @pytest.mark.smoke
@@ -106,11 +112,18 @@ def test_agents_simple_agent(
             )
 
     @pytest.mark.smoke
+    @pytest.mark.parametrize(
+        "enable_streaming",
+        [
+            pytest.param(False, id="streaming_disabled"),
+        ],
+    )
     def test_agents_rag_agent(
         self,
         unprivileged_llama_stack_client: LlamaStackClient,
         llama_stack_models: ModelInfo,
         vector_store_with_example_docs: VectorStore,
+        enable_streaming: bool,
     ) -> None:
         """
         Test RAG agent that can answer questions about the Torchtune project using the documents
@@ -123,7 +136,8 @@ def test_agents_rag_agent(
         Based on "Build a RAG Agent" example available at
         https://llamastack.github.io/docs/getting_started/detailed_tutorial
 
-        # TODO: update this example to use the vector_store API
+        Note: streaming is not tested (enable_streaming = False), as it seems to be broken in
+        llama-stack 0.3.0 (Agents API is only partially implemented)
         """
 
         # Create the RAG agent connected to the vector database
@@ -147,19 +161,26 @@ def test_agents_rag_agent(
             rag_agent=rag_agent,
             session_id=session_id,
             turns_with_expectations=turns_with_expectations,
-            stream=True,
+            stream=enable_streaming,
             verbose=True,
             min_keywords_required=1,
             print_events=False,
         )
 
         # Assert that validation was successful
-        assert validation_result["success"], f"RAG agent validation failed. Summary: {validation_result['summary']}"
+        assert validation_result["success"], (
+            f"RAG agent validation failed with streaming={enable_streaming}. Summary: {validation_result['summary']}"
+        )
 
         # Additional assertions for specific requirements
         for result in validation_result["results"]:
-            assert result["event_count"] > 0, f"No events generated for question: {result['question']}"
-            assert result["response_length"] > 0, f"No response content for question: {result['question']}"
+            assert result["response_length"] > 0, (
+                f"No response content for question: {result['question']} (streaming={enable_streaming})"
+            )
             assert len(result["found_keywords"]) > 0, (
-                f"No expected keywords found in response for: {result['question']}"
+                f"No expected keywords found in response for: {result['question']} (streaming={enable_streaming})"
             )
+            if enable_streaming:
+                assert result["event_count"] > 0, (
+                    f"No events generated for question: {result['question']} (streaming={enable_streaming})"
+                )
diff --git a/tests/llama_stack/conftest.py b/tests/llama_stack/conftest.py
@@ -19,6 +19,7 @@
     create_llama_stack_distribution,
     wait_for_llama_stack_client_ready,
     vector_store_create_file_from_url,
+    wait_for_unique_llama_stack_pod,
 )
 from utilities.constants import DscComponents, Annotations
 from utilities.data_science_cluster_utils import update_components_in_dsc
@@ -144,8 +145,8 @@ def test_with_remote_milvus(llama_stack_server_config):
     server_config: Dict[str, Any] = {
         "containerSpec": {
             "resources": {
-                "requests": {"cpu": "250m", "memory": "500Mi"},
-                "limits": {"cpu": "2", "memory": "12Gi"},
+                "requests": {"cpu": "1", "memory": "3Gi"},
+                "limits": {"cpu": "3", "memory": "6Gi"},
             },
             "env": env_vars,
             "name": "llama-stack",
@@ -208,6 +209,7 @@ def _get_llama_stack_distribution_deployment(
     """
     Returns the Deployment resource for a given LlamaStackDistribution.
     Note: The deployment is created by the operator; this function retrieves it.
+    Includes a workaround for RHAIENG-1819 to ensure exactly one pod exists.
 
     Args:
         client (DynamicClient): Kubernetes client
@@ -222,9 +224,12 @@ def _get_llama_stack_distribution_deployment(
         name=llama_stack_distribution.name,
         min_ready_seconds=10,
     )
-
+    deployment.timeout_seconds = 120
     deployment.wait(timeout=120)
     deployment.wait_for_replicas()
+    # Workaround for RHAIENG-1819 (Incorrect number of llama-stack pods deployed after
+    # creating LlamaStackDistribution after setting custom ca bundle in DSCI)
+    wait_for_unique_llama_stack_pod(client=client, namespace=llama_stack_distribution.namespace)
     yield deployment
 
 
@@ -321,6 +326,7 @@ def _create_llama_stack_test_route(
                 }
             }
         ):
+            route.wait(timeout=60)
             yield route
 
 
@@ -355,11 +361,11 @@ def _create_llama_stack_client(
 ) -> Generator[LlamaStackClient, Any, Any]:
     # LLS_CLIENT_VERIFY_SSL is false by default to be able to test with Self-Signed certificates
     verifySSL = os.getenv("LLS_CLIENT_VERIFY_SSL", "false").lower() == "true"
-    http_client = httpx.Client(verify=verifySSL)
+    http_client = httpx.Client(verify=verifySSL, timeout=240)
     try:
         client = LlamaStackClient(
             base_url=f"https://{route.host}",
-            timeout=180.0,
+            max_retries=3,
             http_client=http_client,
         )
         wait_for_llama_stack_client_ready(client=client)
diff --git a/tests/llama_stack/utils.py b/tests/llama_stack/utils.py
@@ -2,11 +2,14 @@
 from typing import Any, Callable, Dict, Generator, List, cast
 
 from kubernetes.dynamic import DynamicClient
+from kubernetes.dynamic.exceptions import ResourceNotFoundError
 from llama_stack_client import LlamaStackClient, APIConnectionError, InternalServerError
 from llama_stack_client.types.vector_store import VectorStore
 from ocp_resources.llama_stack_distribution import LlamaStackDistribution
+from ocp_resources.pod import Pod
 from simple_logger.logger import get_logger
 from timeout_sampler import retry
+from utilities.exceptions import UnexpectedResourceCountError
 
 
 from tests.llama_stack.constants import (
@@ -15,6 +18,7 @@
     ModelInfo,
     ValidationResult,
     TurnResult,
+    LLS_CORE_POD_FILTER,
 )
 
 from llama_stack_client import Agent, AgentEventLogger
@@ -49,24 +53,55 @@ def create_llama_stack_distribution(
         yield llama_stack_distribution
 
 
+@retry(
+    wait_timeout=60,
+    sleep=5,
+    exceptions_dict={ResourceNotFoundError: [], UnexpectedResourceCountError: []},
+)
+def wait_for_unique_llama_stack_pod(client: DynamicClient, namespace: str) -> Pod:
+    """Wait until exactly one LlamaStackDistribution pod is found in the
+    namespace (multiple pods may indicate known bug RHAIENG-1819)."""
+    pods = list(
+        Pod.get(
+            dyn_client=client,
+            namespace=namespace,
+            label_selector=LLS_CORE_POD_FILTER,
+        )
+    )
+    if not pods:
+        raise ResourceNotFoundError(f"No pods found with label selector {LLS_CORE_POD_FILTER} in namespace {namespace}")
+    if len(pods) != 1:
+        raise UnexpectedResourceCountError(
+            f"Expected exactly 1 pod with label selector {LLS_CORE_POD_FILTER} "
+            f"in namespace {namespace}, found {len(pods)}. "
+            f"(possibly due to known bug RHAIENG-1819)"
+        )
+    return pods[0]
+
+
 @retry(wait_timeout=90, sleep=5)
 def wait_for_llama_stack_client_ready(client: LlamaStackClient) -> bool:
+    """Wait for LlamaStack client to be ready by checking health, version, and database access."""
     try:
         client.inspect.health()
         version = client.inspect.version()
-        # Check access to llama-stack server database
+        models = client.models.list()
         vector_stores = client.vector_stores.list()
         files = client.files.list()
         LOGGER.info(
             f"Llama Stack server is available! "
             f"(version:{version.version} "
+            f"models:{len(models)} "
             f"vector_stores:{len(vector_stores.data)} "
             f"files:{len(files.data)})"
         )
         return True
+
     except (APIConnectionError, InternalServerError) as error:
         LOGGER.debug(f"Llama Stack server not ready yet: {error}")
+        LOGGER.debug(f"Base URL: {client.base_url}, Error type: {type(error)}, Error details: {str(error)}")
         return False
+
     except Exception as e:
         LOGGER.warning(f"Unexpected error checking Llama Stack readiness: {e}")
         return False
@@ -108,6 +143,7 @@ def _response_fn(*, question: str) -> str:
         response = llama_stack_client.responses.create(
             input=question,
             model=llama_stack_models.model_id,
+            stream=False,
             tools=[
                 {
                     "type": "file_search",