Skip to content

tests(maas-billing): ensure MaaS gateway, policies, and maas controller setup#1013

Merged
mwaykole merged 12 commits intoopendatahub-io:mainfrom
SB159:feature/maas-gateway-and-policies
Jan 26, 2026
Merged

tests(maas-billing): ensure MaaS gateway, policies, and maas controller setup#1013
mwaykole merged 12 commits intoopendatahub-io:mainfrom
SB159:feature/maas-gateway-and-policies

Conversation

@SB159
Copy link
Copy Markdown
Contributor

@SB159 SB159 commented Jan 12, 2026

Description

This change ensures required MaaS components are available before MaaS billing tests run.

  • Ensures the MaaS Gateway exists and is reachable
  • Ensures RateLimitPolicy and TokenRateLimitPolicy are present and bound to the Gateway
  • Ensures the MaaS controller (modelsAsService) is enabled in the DataScienceCluster

How Has This Been Tested?

  • Tested on latest cluster, All tests passed; no impact outside MaaS billing tests
  • Ran pre-commit run --all-files

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Tests

    • Expanded MaaS billing test infrastructure with many new fixtures, richer wiring for gateway/deployment/policy scenarios, simplified TinyLlama integration flow, explicit readiness waits, probe-based API reachability checks, and replaced a legacy helper with focused gateway/listener/probe utilities; improved logging for deployment and gateway events.
  • Chores

    • Added and standardized MaaS configuration constants and rate-limit/token policy identifiers; renamed user group identifiers to tier-based names to support gateway/policy testing.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link
Copy Markdown

The following are automatically added/executed:

  • PR size label.
  • Run pre-commit
  • Run tox
  • Add PR author as the PR assignee
  • Build image based on the PR

Available user actions:

  • To mark a PR as WIP, add /wip in a comment. To remove it from the PR comment /wip cancel to the PR.
  • To block merging of a PR, add /hold in a comment. To un-block merging of PR comment /hold cancel.
  • To mark a PR as approved, add /lgtm in a comment. To remove, add /lgtm cancel.
    lgtm label removed on each new commit push.
  • To mark PR as verified comment /verified to the PR, to un-verify comment /verified cancel to the PR.
    verified label removed on each new commit push.
  • To Cherry-pick a merged PR /cherry-pick <target_branch_name> to the PR. If <target_branch_name> is valid,
    and the current PR is merged, a cherry-picked PR would be created and linked to the current PR.
  • To build and push image to quay, add /build-push-pr-image in a comment. This would create an image with tag
    pr-<pr_number> to quay repository. This image tag, however would be deleted on PR merge or close action.
Supported labels

{'/build-push-pr-image', '/hold', '/verified', '/cherry-pick', '/wip', '/lgtm'}

@SB159 SB159 requested a review from dbasunag January 15, 2026 03:34
@SB159 SB159 requested a review from dbasunag January 16, 2026 16:00
@SB159 SB159 marked this pull request as ready for review January 22, 2026 20:40
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Expanded MaaS control-plane test wiring: added gateway/policy constants and fixtures, extended fixture signatures for MaaS parameters, added gateway listener and probe utilities, and reworked TinyLlama MaaS integration, readiness checks, and logging. (≤50 words)

Changes

Cohort / File(s) Change Summary
MaaS Control-Plane Fixtures
tests/model_serving/model_server/maas_billing/conftest.py
Added ~13 new fixtures (maas_gateway_and_policies, maas_controller_enabled_latest, maas_control_plane_namespace, maas_tier_mapping_cm, maas_api_deployment_available, maas_api_endpoints_ready, maas_api_gateway_reachable, maas_gateway_api, maas_gateway_target_ref, maas_request_ratelimit_policy, maas_token_ratelimit_policy, maas_usage_policies, etc.). Extended fixture/function signatures (minted_token, maas_inference_service_tinyllama, maas_token_for_actor, maas_gateway_rate_limits, maas_models). Reworked TinyLlama creation, readiness, logging, and teardown; renamed MAAS group constants to tier-based names.
MaaS Utility Functions
tests/model_serving/model_server/maas_billing/utils.py
Removed llmis_name. Updated patch_llmisvc_with_maas_router to use gateway constants. Added maas_gateway_listeners(hostname), endpoints_have_ready_addresses(admin_client, namespace, name), and gateway_probe_reaches_maas_api(http_session, probe_url, request_timeout_seconds) to support listener config and probe-based reachability checks.
MAAS / Global Constants
utilities/constants.py
Added constants: MAAS_GATEWAY_NAME, MAAS_GATEWAY_NAMESPACE, MAAS_RATE_LIMIT_POLICY_NAME, MAAS_TOKEN_RATE_LIMIT_POLICY_NAME. Renamed MAAS free/premium group constants to tier-* names.
Public Entities / Test Exposures
tests/model_serving/model_server/maas_billing/...
Exposed/used additional Kubernetes/MaaS test types/utilities in tests: Deployment, Gateway, ConfigMap, DataScienceCluster, TimeoutSampler, RateLimitPolicy, TokenRateLimitPolicy.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: expanded MaaS testing setup to ensure gateway, policies, and controller availability before billing tests run.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/model_serving/model_server/maas_billing/conftest.py`:
- Around line 652-657: The fixture maas_gateway_and_policies currently declares
-> None but yields a value; update its return type to Generator[None, None,
None] to reflect a yielding fixture, and ensure from typing import Generator is
imported at the top of the file; keep the fixture name maas_gateway_and_policies
and existing parameters maas_gateway_api and maas_usage_policies unchanged.
🧹 Nitpick comments (9)
tests/model_serving/model_server/maas_billing/utils.py (5)

20-27: Remove commented-out import.

The commented-out Gateway import on line 20 should be removed as dead code. If it's not needed, it adds noise; if it's needed later, it can be re-added.

🧹 Proposed cleanup
-# from ocp_resources.gateway_gateway_networking_k8s_io import Gateway
 from ocp_resources.endpoints import Endpoints
 from ocp_resources.config_map import ConfigMap
 from ocp_resources.namespace import Namespace

450-470: Add docstring to utility function.

Per coding guidelines, component-specific utility functions should have docstrings. This function configures Gateway listeners for MaaS integration.

📝 Proposed docstring
 def maas_gateway_listeners(hostname: str) -> List[Dict[str, Any]]:
+    """Return Gateway listener definitions for MaaS HTTP and HTTPS endpoints.
+
+    Args:
+        hostname: The hostname to configure on both listeners.
+
+    Returns:
+        List of listener configuration dictionaries for HTTP (port 80)
+        and HTTPS (port 443) with TLS termination.
+    """
     return [
         {
             "name": "http",

473-495: Add docstring to utility function.

This function performs namespace detection logic that would benefit from documentation explaining its purpose and behavior.

📝 Proposed docstring
 def detect_maas_control_plane_namespace(
     admin_client: DynamicClient,
     candidate_namespaces: list[str],
 ) -> str:
+    """Detect the MaaS control-plane namespace by locating the tier-mapping ConfigMap.
+
+    Args:
+        admin_client: Kubernetes dynamic client.
+        candidate_namespaces: Ordered list of namespaces to check.
+
+    Returns:
+        The first namespace containing the 'tier-to-group-mapping' ConfigMap.
+
+    Raises:
+        RuntimeError: If the ConfigMap is not found in any candidate namespace.
+    """
     for ns in candidate_namespaces:

498-514: Add docstring to utility function.

📝 Proposed docstring
 def endpoints_have_ready_addresses(
     admin_client: DynamicClient,
     namespace: str,
     name: str,
 ) -> bool:
+    """Check if an Endpoints resource has at least one ready address.
+
+    Args:
+        admin_client: Kubernetes dynamic client.
+        namespace: Namespace of the Endpoints resource.
+        name: Name of the Endpoints resource.
+
+    Returns:
+        True if any subset has addresses, False otherwise.
+    """
     endpoints = Endpoints(

517-525: Add docstring to utility function.

The logic treating 401/403 as "ok" is correct for a reachability probe (the API responded, even if auth is required), but this should be documented.

📝 Proposed docstring
 def gateway_probe_reaches_maas_api(
     http_session: requests.Session,
     probe_url: str,
     request_timeout_seconds: int,
 ) -> Tuple[bool, int, str]:
+    """Probe the MaaS API via Gateway to check reachability.
+
+    Args:
+        http_session: HTTP session for making requests.
+        probe_url: URL to probe.
+        request_timeout_seconds: Request timeout in seconds.
+
+    Returns:
+        Tuple of (reachable, status_code, response_text). Reachable is True
+        if status is 200, 401, or 403 (API responded, auth may be required).
+    """
     response = http_session.get(probe_url, timeout=request_timeout_seconds)
tests/model_serving/model_server/maas_billing/conftest.py (4)

57-62: Remove commented-out old constants.

The old group name constants on lines 57-58 are commented out. If the new names are the correct ones, remove the old comments to reduce noise.

🧹 Proposed cleanup
-# MAAS_FREE_GROUP = "maas-free-users"
-# MAAS_PREMIUM_GROUP = "maas-premium-users"
 MAAS_FREE_GROUP = "tier-free-users"
 MAAS_PREMIUM_GROUP = "tier-premium-users"

602-613: Remove commented-out debug code.

This commented-out debug/pause code should be removed before merging. If debugging capability is needed in the future, it can be re-added or implemented via a proper debug flag.

🧹 Proposed cleanup
             LOGGER.info(
                 f"MaaS: TinyLlama LLMI {llm_service.namespace}/{llm_service.name} "
                 f"Ready and patched (storage_uri={storage_uri})"
             )

-            # _debug_dump_model_registration(
-            #     admin_client=admin_client,
-            #     llm_ns=unprivileged_model_namespace.name,
-            # )
-            # DEBUG: pause here so you can run oc commands side-by-side
-            # if os.environ.get("MAAS_DEBUG_PAUSE") == "1":
-            #     LOGGER.warning(
-            #         "MAAS_DEBUG_PAUSE=1 -> Pausing 20 minutes after LLMI Ready+patched. "
-            #         "Run your oc/curl checks now, then stop the pause by Ctrl+C or wait."
-            #     )
-            #     time.sleep(3 * 60 * 60)
-
             yield llm_service

660-699: Add docstring to fixture.

Per coding guidelines, fixtures should have Google-format docstrings explaining what they provide.

📝 Proposed docstring
 `@pytest.fixture`(scope="session")
 def maas_controller_enabled_latest(
     dsc_resource: DataScienceCluster,
     maas_gateway_and_policies: None,
 ) -> Generator[DataScienceCluster, None, None]:
+    """DataScienceCluster with modelsAsService controller enabled.
+
+    Ensures the MaaS controller is enabled in the DSC, enabling it temporarily
+    if needed, and restores the original state on teardown.
+
+    Yields:
+        The DataScienceCluster resource with MaaS controller ready.
+    """
     original_components = dsc_resource.instance.spec.components or {}

854-899: Consider adding docstrings to rate limit policy fixtures.

The maas_request_ratelimit_policy and maas_token_ratelimit_policy fixtures create bootstrap policies. Adding brief docstrings would clarify that these are placeholder policies that get patched with test-specific limits during rate-limiting tests.

@SB159 SB159 requested a review from dbasunag January 22, 2026 21:07
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/model_serving/model_server/maas_billing/conftest.py`:
- Around line 729-743: The fixture maas_api_endpoints_ready incorrectly discards
the TimeoutSampler result and returns immediately; update the loop that iterates
over TimeoutSampler(func=endpoints_have_ready_addresses,
admin_client=admin_client, namespace=maas_control_plane_namespace,
name="maas-api") to check the sampled value (e.g., for ready in
TimeoutSampler(...): if ready: return) so the fixture only returns when
endpoints_have_ready_addresses is truthy; otherwise let the sampler
exhaust/raise so the test fails instead of succeeding prematurely.
- Around line 58-59: DSC_NAME and MAAS_DSC_COMPONENT_KEY in conftest.py are
unused; either remove these dead constants or explicitly document their intent
if they are meant to be exported for future/external use. Locate the symbols
DSC_NAME and MAAS_DSC_COMPONENT_KEY in conftest.py and either (a) delete both
definitions and any related imports/exports if they are not needed, or (b)
retain them but add a clear comment or module-level docstring explaining why
they exist (e.g., reserved for external consumers or future feature X) and
consider listing them in __all__ if they are part of the public test API. Ensure
tests and imports still pass after the change.
🧹 Nitpick comments (2)
tests/model_serving/model_server/maas_billing/conftest.py (2)

643-670: Remove commented-out fixture dependency.

Line 647 contains commented-out code # maas_usage_policies: None,. This should either be removed if not needed, or uncommented if required. Per past review discussion, this appears to have been left as a remnant.

🧹 Proposed fix
 `@pytest.fixture`(scope="session")
 def maas_controller_enabled_latest(
     dsc_resource: DataScienceCluster,
     maas_gateway_api: None,
-    # maas_usage_policies: None,
     maas_request_ratelimit_policy: None,
     maas_token_ratelimit_policy: None,
 ) -> Generator[DataScienceCluster, None, None]:

638-641: Consider adding docstrings to new fixtures.

Several new fixtures lack docstrings: maas_gateway_api_hostname, maas_tier_mapping_cm, maas_api_deployment_available, maas_api_endpoints_ready, maas_api_gateway_reachable, maas_gateway_target_ref, maas_request_ratelimit_policy, and maas_token_ratelimit_policy.

Per coding guidelines, fixtures should have Google-format docstrings for clarity on their purpose and usage.

@SB159 SB159 requested a review from dbasunag January 23, 2026 16:59
@SB159 SB159 requested a review from dbasunag January 23, 2026 19:35
@SB159 SB159 requested a review from dbasunag January 23, 2026 21:33
dbasunag
dbasunag previously approved these changes Jan 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/model_serving/model_server/maas_billing/conftest.py`:
- Around line 70-79: The fixtures (e.g., minted_token and
maas_gateway_rate_limits) are missing Google-style docstrings and some
parameters lack type annotations (notably request_session_http and
maas_tier_mapping_cm); update each changed/added fixture to include a
Google-format docstring describing Args and Returns, and add explicit type hints
for all parameters and return values (e.g., annotate request_session_http with
its session type and maas_tier_mapping_cm with its mapping/type), ensuring
function signatures like minted_token(...) -> str and
maas_gateway_rate_limits(...) have complete annotations and docstrings
consistent with the test guidelines.
- Around line 742-823: The fixtures maas_gateway_api (using Gateway),
maas_request_ratelimit_policy (using RateLimitPolicy) and
maas_token_ratelimit_policy (using TokenRateLimitPolicy) currently set
ensure_exists=False and teardown=True; update them to follow repo pattern by
setting ensure_exists=True and stop unconditionally deleting cluster-shared
resources—either remove teardown=True or implement conditional teardown that
only deletes when the fixture created the resource (mirror the
conditional-teardown approach used in utilities/llmd_utils.py) so shared
infrastructure is not destroyed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/model_serving/model_server/maas_billing/utils.py`:
- Around line 474-483: The helper gateway_probe_reaches_maas_api currently calls
http_session.get directly and will raise requests exceptions (timeout/connection
errors) which breaks the caller's retry contract; wrap the http_session.get(...)
call in a try/except that catches requests.RequestException (import if missing)
and on exception return (False, 0, str(exc)); otherwise keep the existing
behavior (status_code, response_text, ok = status_code in (200,401,403)) and
return (ok, status_code, response_text).
- Around line 455-471: The function endpoints_have_ready_addresses uses direct
attribute access endpoints.instance.subsets which can raise AttributeError if
subsets is missing; change that access to use getattr(endpoints.instance,
"subsets", None) (matching the defensive pattern used elsewhere) and then keep
the existing logic (return False if subsets is falsy, otherwise
any(subset.addresses)). Update the code referencing Endpoints and
endpoints.instance.subsets accordingly.

Copy link
Copy Markdown
Contributor

@kpunwatk kpunwatk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mwaykole mwaykole enabled auto-merge (squash) January 25, 2026 06:24
@mwaykole mwaykole merged commit 2fa5ace into opendatahub-io:main Jan 26, 2026
8 checks passed
@github-actions
Copy link
Copy Markdown

Status of building tag latest: success.
Status of pushing tag latest to image registry: success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants