feat: add GlobalPlanner component for centralized scaling #5702

daiyaanarfeen · 2026-01-27T22:59:31Z

Implement a two-component architecture for delegated scaling:

Planner (Mode A): Makes scaling decisions, delegates to GlobalPlanner
GlobalPlanner (Mode B): Centralized execution service via Kubernetes API

Architecture changes:

Create separate GlobalPlanner component under components/src/dynamo/global_planner/
Add scale_protocol.py for ScaleRequest/ScaleResponse data structures
Add remote_planner_client.py for Planner to call GlobalPlanner
Update Planner with two modes: local (direct) and delegating (remote)

GlobalPlanner features:

Receives scale requests from delegating Planners
Executes scaling via KubernetesConnector
Optional namespace authorization (managed-namespaces)
Per-DGD connector caching for efficiency

Planner changes:

Add --planner-mode argument (local, delegating)
Add --global-planner-namespace and --global-planner-component args
Update planner_core.py to support delegation mode
Maintain backward compatibility with local mode

Tests:

Add 32 Planner unit tests (argparse, protocol, client)
Add 14 GlobalPlanner tests (handler unit tests, integration)
Organize tests: tests/planner/ and tests/global_planner/
All 46 tests passing

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Release Notes

New Features
- Added Global Planner component with scaling request endpoint to handle replica scaling operations.
- Introduced delegating planner mode enabling centralized scaling through a remote Global Planner instance.
- Added optional namespace-based authorization for scaling requests.
- Added health check endpoint for planner monitoring.
Tests
- Added comprehensive unit tests for scale request handling, remote planner client, and planner argument validation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2026-01-27T22:59:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Daiyaan <darfeen@nvidia.com>

…sion Signed-off-by: Daiyaan <darfeen@nvidia.com>

Signed-off-by: Daiyaan <darfeen@nvidia.com>

coderabbitai · 2026-01-27T23:41:13Z

Walkthrough

This pull request introduces a new GlobalPlanner component as a centralized scaling service for Kubernetes-based distributed systems. The SLA Planner can now operate in delegating mode, delegating scaling decisions to the remote GlobalPlanner service instead of managing replicas locally. Local mode preserves existing behavior.

Changes

Cohort / File(s)	Summary
GlobalPlanner Module Initialization `components/src/dynamo/global_planner/__init__.py`	Adds package initialization with SPDX headers and module docstring. Exports `ScaleRequestHandler` as public API.
GlobalPlanner Entry Point `components/src/dynamo/global_planner/__main__.py`	Implements main coroutine decorated with `@dynamo_worker`, configures logging, validates CLI arguments, instantiates `ScaleRequestHandler`, and serves two endpoints: `scale_request` and `health` with streaming responses.
GlobalPlanner Configuration & Handler `components/src/dynamo/global_planner/argparse_config.py`, `components/src/dynamo/global_planner/scale_handler.py`	Defines argument parsing with namespace and managed-namespaces options. Implements `ScaleRequestHandler` with per-DGD `KubernetesConnector` caching, namespace-based access control, and replica scaling via Kubernetes API.
Scale Protocol & Remote Client `components/src/dynamo/planner/scale_protocol.py`, `components/src/dynamo/planner/remote_planner_client.py`	Introduces `ScaleRequest`, `ScaleResponse`, and `TargetReplicaRequest` Pydantic models for planner-to-planner communication. Adds `RemotePlannerClient` for delegating scaling requests to centralized planner with lazy endpoint client initialization.
Planner Integration & Extensions `components/src/dynamo/planner/kubernetes_connector.py`, `components/src/dynamo/planner/utils/planner_argparse.py`, `components/src/dynamo/planner/utils/planner_core.py`	Extends `KubernetesConnector` with optional `parent_dgd_name` parameter. Adds `--planner-mode`, `--global-planner-namespace`, and `--global-planner-component` CLI arguments. Refactors `SLAPlannerConfig` and `start_sla_planner` to support delegating mode with conditional logic for local vs. remote scaling paths.
GlobalPlanner Tests `tests/global_planner/unit/test_scale_request_handler.py`	Tests `ScaleRequestHandler` authorization, per-DGD connector caching, error handling, and blocking mode. Uses `AsyncMock` to simulate Kubernetes interactions.
Planner Integration Tests `tests/planner/unit/test_planner_argparse.py`, `tests/planner/unit/test_remote_planner.py`	Tests argument parsing for planner-mode and global-planner-* options. Tests `RemotePlannerClient` successful and error flows, client reuse, and response handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A planner so grand, delegating with care,
Scaling the replicas floating in air!
Local or global, we've options galore,
Namespaces managed, now globally stored. ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.05% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description provides substantive technical detail about the implementation but fails to follow the required template structure with clear sections for Overview, Details, Where should the reviewer start, and Related Issues.	Reorganize the description to explicitly match the template with dedicated sections. Ensure Related Issues section properly uses GitHub action keywords (Closes/Fixes/Resolves) and provides specific issue numbers.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add GlobalPlanner component for centralized scaling' clearly and concisely describes the main change: introducing a new GlobalPlanner component for centralized scaling, matching the primary objective of the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/src/dynamo/planner/kubernetes_connector.py (1)
285-285: Bug: Positional argument mismatch in CLI instantiation.

The KubernetesConnector constructor signature is (dynamo_namespace, model_name=None, k8s_namespace=None, parent_dgd_name=None), but args.k8s_namespace is being passed as the second positional argument, which maps to model_name instead of k8s_namespace.
🐛 Proposed fix
-    connector = KubernetesConnector(args.dynamo_namespace, args.k8s_namespace)
+    connector = KubernetesConnector(
+        dynamo_namespace=args.dynamo_namespace,
+        k8s_namespace=args.k8s_namespace,
+    )

🤖 Fix all issues with AI agents

In `@components/src/dynamo/global_planner/scale_handler.py`:
- Around line 97-104: The target_replicas comprehension assigns
r.sub_component_type (a str) directly to TargetReplica.sub_component_type which
expects a SubComponentType enum; update the comprehension in scale_handler.py so
you convert the string to the enum (use SubComponentType(r.sub_component_type)
or SubComponentType[r.sub_component_type] depending on how the enum is defined)
when constructing each TargetReplica, and add a small try/except to handle
invalid values (log or raise a clear error) so malformed
TargetReplicaRequest.sub_component_type values are surfaced.
- Around line 113-115: The call to the synchronous method get_graph_deployment
is incorrectly awaited; change the statement that assigns deployment from await
connector.kube_api.get_graph_deployment(connector.parent_dgd_name) to a direct
call without await so deployment =
connector.kube_api.get_graph_deployment(connector.parent_dgd_name), ensuring any
surrounding async function does not expect an awaitable from
get_graph_deployment.

In `@components/src/dynamo/planner/remote_planner_client.py`:
- Line 1: Add the standard SPDX copyright header at the top of the module to
satisfy CI: insert the SPDX short identifier and copyright line before the
module docstring in remote_planner_client.py (i.e., place the header above the
existing top-level string literal that currently reads "Client for calling
remote planner's scale_request endpoint."). Ensure the header uses the project's
standard format (SPDX-License-Identifier and copyright holder/year) so the file
starts with that header followed by the existing docstring.

In `@components/src/dynamo/planner/scale_protocol.py`:
- Line 1: Add the standard SPDX license header as the very first lines of
scale_protocol.py (above the module docstring) so the CI recognizes the file’s
license; specifically insert the project's standard SPDX identifier (for example
"SPDX-License-Identifier: Apache-2.0") at the top of the file, preserving the
existing module docstring and rest of the code in scale_protocol.py.

In `@components/src/dynamo/planner/utils/planner_core.py`:
- Around line 674-679: The current flow may reference next_num_p and next_num_d
when they are undefined if _compute_replica_requirements fails or next_num_req
is None; fix by only performing the scaling block when replica computation
succeeded (i.e., after _compute_replica_requirements returns a non-None
next_num_req) or by explicitly initializing/returning early—move the code that
checks args.no_operation and calls _delegate_scaling / _execute_local_scaling
(referencing planner_mode, _delegate_scaling, _execute_local_scaling) inside the
conditional where next_num_p and next_num_d are set (after
_compute_replica_requirements) or add a guard like “if next_num_req is None:
return” before using next_num_p/next_num_d.

In `@tests/global_planner/unit/test_scale_request_handler.py`:
- Around line 55-64: The call site in scale_handler.py incorrectly awaits the
synchronous method get_graph_deployment (defined as def get_graph_deployment in
kube.py); remove the await so the call is a normal synchronous call (e.g.,
result = kube_api.get_graph_deployment(...)) and adjust any surrounding code
accordingly, and update the unit test mock in
tests/global_planner/unit/test_scale_request_handler.py to use Mock() instead of
AsyncMock() for mock_connector.kube_api.get_graph_deployment so the test no
longer masks the runtime TypeError.

🧹 Nitpick comments (14)

components/src/dynamo/global_planner/argparse_config.py (1)
68-72: Dead code: validation condition is unreachable.

The condition args.managed_namespaces and len(args.managed_namespaces) == 0 can never be True. When nargs="+" is used (line 44), argparse requires at least one value if the flag is provided—it will raise an error before validate_args is called. If the flag is omitted, managed_namespaces is None (falsy), so the and short-circuits.

You can safely remove this validation or simplify it.
🧹 Suggested simplification
 def validate_args(args):
     """Validate GlobalPlanner arguments.

     Args:
         args: Parsed arguments from argparse

     Raises:
         ValueError: If arguments are invalid
     """
-    # managed_namespaces is optional - if not specified, accept all
-    if args.managed_namespaces and len(args.managed_namespaces) == 0:
-        raise ValueError(
-            "--managed-namespaces must have at least one namespace if specified"
-        )
+    # managed_namespaces is optional - if not specified, accept all
+    # Note: nargs="+" already ensures at least one value when flag is provided
+    pass
components/src/dynamo/planner/scale_protocol.py (2)
7-9: Remove unused TYPE_CHECKING block.

The TYPE_CHECKING import and empty if block serve no purpose currently. Either remove it or add the intended type-only imports.
🧹 Proposed fix
-from typing import TYPE_CHECKING, List, Optional
+from typing import List, Optional
 
 from pydantic import BaseModel
-
-# Import SubComponentType only for type checking to avoid runtime dependency
-if TYPE_CHECKING:
-    pass
41-46: Consider using Literal for the status field.

Using Literal["success", "error", "scaling"] instead of str would provide better type safety and documentation of valid values.
🧹 Proposed enhancement
+from typing import List, Literal, Optional
+
 class ScaleResponse(BaseModel):
     """Response from scaling operation"""
 
-    status: str  # "success", "error", "scaling"
+    status: Literal["success", "error", "scaling"]
     message: str
     current_replicas: dict  # {"prefill": 3, "decode": 5}
components/src/dynamo/planner/remote_planner_client.py (2)
52-56: Clarify the intent of consuming only the first response.

The async for with immediate break works but is non-obvious. A brief comment explaining why only the first response is consumed (e.g., round-robin returns one response per instance) would improve readability.
🧹 Suggested clarification
-        response_data = None
-        async for response in await self._client.round_robin(request_json):
-            # Take first response
-            response_data = response
-            break
+        response_data = None
+        async for response in await self._client.round_robin(request_json):
+            # round_robin selects one instance; consume its single response
+            response_data = response
+            break
33-34: Wrap wait_for_instances() call with a timeout to prevent indefinite blocking.

The method signature does not support a timeout parameter. Since similar initialization waits in the codebase use asyncio.wait_for() (e.g., in sglang/main.py), consider wrapping this call with an appropriate timeout:
await asyncio.wait_for(self._client.wait_for_instances(), timeout=30.0)
This prevents the initialization from hanging indefinitely if the GlobalPlanner service is unavailable.
tests/planner/unit/test_planner_argparse.py (1)
59-65: Consider adding a positive validation test.

You test that validation fails when namespace is missing, but there's no test confirming validate_planner_args succeeds (returns without error) when all required arguments are provided in delegating mode.
🧪 Suggested additional test
def test_validate_delegating_mode_with_namespace():
    """Test validation passes for delegating mode with GlobalPlanner namespace."""
    parser = create_sla_planner_parser()
    args = parser.parse_args([
        "--namespace", "test-ns",
        "--planner-mode", "delegating",
        "--global-planner-namespace", "global-ns",
    ])

    # Should not raise
    validate_planner_args(args)
tests/planner/unit/test_remote_planner.py (3)
42-49: Consider prefixing unused parameter with underscore.

The request_json parameter is unused in the mock function. While this is intentional (matching the expected signature), prefixing with _ would silence the linter and clarify intent.
♻️ Suggested change
-    async def mock_round_robin(request_json):
+    async def mock_round_robin(_request_json):
         yield {
             "status": "success",
             "message": "Scaled successfully",
             "current_replicas": {"prefill": 3, "decode": 5},
         }
57-57: Prefix unused variable with underscore.

The mock_client variable is unpacked but never used. Prefix with _ to indicate it's intentionally unused.
♻️ Suggested change
-    runtime, mock_client = mock_runtime
+    runtime, _mock_client = mock_runtime
146-148: Consider clearer empty async generator pattern.

The return before yield pattern works but is confusing. A cleaner approach would be more explicit.
♻️ Suggested alternatives
-    async def mock_round_robin_empty(request_json):
-        return
-        yield  # Make it a generator but never yield anything
+    async def mock_round_robin_empty(_request_json):
+        # Empty async generator - never yields
+        if False:
+            yield
Or use an async generator expression:
client_mock.round_robin = AsyncMock(return_value=(__x async for __x in ()))
components/src/dynamo/global_planner/__main__.py (1)
94-101: Prefix unused request parameter with underscore.

The request parameter is unused in the health check endpoint. This is expected for a health check, but prefixing with _ clarifies intent and silences the linter.
♻️ Suggested change
-    async def health_check(request: HealthCheckRequest):
+    async def health_check(_request: HealthCheckRequest):
         """Health check endpoint for monitoring"""
         yield {
             "status": "healthy",
components/src/dynamo/planner/utils/planner_core.py (1)
564-565: Consider failing fast instead of using "unknown" fallback.

If DYN_PARENT_DGD_K8S_NAME is not set, the request will be sent with graph_deployment_name="unknown", which will likely fail at the GlobalPlanner when it tries to interact with Kubernetes. Failing early with a clear error message might be more helpful for debugging.
♻️ Suggested change
+        graph_deployment_name = os.environ.get("DYN_PARENT_DGD_K8S_NAME")
+        if not graph_deployment_name:
+            logger.error("DYN_PARENT_DGD_K8S_NAME environment variable not set")
+            return
+
         request = ScaleRequest(
             caller_namespace=self.namespace,
-            graph_deployment_name=os.environ.get("DYN_PARENT_DGD_K8S_NAME", "unknown"),
+            graph_deployment_name=graph_deployment_name,
             k8s_namespace=os.environ.get("POD_NAMESPACE", "default"),
components/src/dynamo/global_planner/scale_handler.py (3)
116-119: Rename unused loop variable.

service_name is not used in the loop body. Rename to _service_name to indicate it's intentionally unused.
♻️ Suggested change
-            for service_name, service_spec in deployment["spec"]["services"].items():
+            for _service_name, service_spec in deployment["spec"]["services"].items():
                 sub_type = service_spec.get("subComponentType", "")
                 if sub_type:
                     current_replicas[sub_type] = service_spec.get("replicas", 0)
130-131: Remove redundant exception object from logging.exception.

logging.exception automatically includes the exception information. Including {e} in the message is redundant.
♻️ Suggested change
         except Exception as e:
-            logger.exception(f"Error processing scale request: {e}")
+            logger.exception("Error processing scale request")
             yield {"status": "error", "message": str(e), "current_replicas": {}}
42-42: Consider adding cache eviction or bounds for long-running deployments.

The connectors cache grows unbounded as new DGDs are encountered. For long-running GlobalPlanner instances managing many transient DGDs, this could lead to memory growth. Consider adding an LRU cache or periodic cleanup of stale connectors.

components/src/dynamo/global_planner/scale_handler.py

components/src/dynamo/planner/remote_planner_client.py

components/src/dynamo/planner/scale_protocol.py

coderabbitai · 2026-01-27T23:41:17Z

components/src/dynamo/planner/utils/planner_core.py

        if not self.args.no_operation:
-            target_replicas = [
-                TargetReplica(
-                    sub_component_type=SubComponentType.PREFILL,
-                    component_name=self.prefill_component_name,
-                    desired_replicas=next_num_p,
-                ),
-                TargetReplica(
-                    sub_component_type=SubComponentType.DECODE,
-                    component_name=self.decode_component_name,
-                    desired_replicas=next_num_d,
-                ),
-            ]
-            await self.connector.set_component_replicas(target_replicas, blocking=False)
+            # Execute scaling based on mode
+            if self.planner_mode == "delegating":
+                await self._delegate_scaling(next_num_p, next_num_d)
+            else:
+                await self._execute_local_scaling(next_num_p, next_num_d)


⚠️ Potential issue | 🟠 Major

Potential UnboundLocalError if replica computation fails.

If _compute_replica_requirements raises an exception (caught at lines 670-672), the code returns early. However, if the exception handling path changes or if next_num_req is None (causing early return at implicit flow), next_num_p and next_num_d could be undefined when reaching lines 677-679.

Looking at the current flow: if next_num_req is None, the code doesn't enter the if block at line 647, and no_operation check at 674 would execute with undefined variables.

🐛 Proposed fix - move scaling inside the conditional

try: next_num_p, next_num_d = self._compute_replica_requirements( next_num_req, next_isl, next_osl ) # Update predicted replica metrics in Prometheus if self.prometheus_port != 0: self.prometheus_metrics.predicted_num_p.set(next_num_p) self.prometheus_metrics.predicted_num_d.set(next_num_d) + + if not self.args.no_operation: + # Execute scaling based on mode + if self.planner_mode == "delegating": + await self._delegate_scaling(next_num_p, next_num_d) + else: + await self._execute_local_scaling(next_num_p, next_num_d) except Exception as e: logger.error(f"Failed to compute number of replicas: {e}") return - - if not self.args.no_operation: - # Execute scaling based on mode - if self.planner_mode == "delegating": - await self._delegate_scaling(next_num_p, next_num_d) - else: - await self._execute_local_scaling(next_num_p, next_num_d)

🤖 Prompt for AI Agents

In `@components/src/dynamo/planner/utils/planner_core.py` around lines 674 - 679, The current flow may reference next_num_p and next_num_d when they are undefined if _compute_replica_requirements fails or next_num_req is None; fix by only performing the scaling block when replica computation succeeded (i.e., after _compute_replica_requirements returns a non-None next_num_req) or by explicitly initializing/returning early—move the code that checks args.no_operation and calls _delegate_scaling / _execute_local_scaling (referencing planner_mode, _delegate_scaling, _execute_local_scaling) inside the conditional where next_num_p and next_num_d are set (after _compute_replica_requirements) or add a guard like “if next_num_req is None: return” before using next_num_p/next_num_d.

tests/global_planner/unit/test_scale_request_handler.py

…thod Signed-off-by: Daiyaan <darfeen@nvidia.com>

pull-request-size bot added the size/XXL label Jan 27, 2026

github-actions bot added feat planner labels Jan 27, 2026

daiyaanarfeen added 2 commits January 27, 2026 15:24

feat: add GlobalPlanner component for centralized scaling

0ea00e8

Signed-off-by: Daiyaan <darfeen@nvidia.com>

refactor: move planner arg validation to planner_core for better cohe…

bb98036

…sion Signed-off-by: Daiyaan <darfeen@nvidia.com>

daiyaanarfeen force-pushed the darfeen/global-planner branch from 8ee75fc to bb98036 Compare January 27, 2026 23:25

test: remove redundant tests covered by integration tests

adae932

Signed-off-by: Daiyaan <darfeen@nvidia.com>

daiyaanarfeen marked this pull request as ready for review January 27, 2026 23:35

daiyaanarfeen requested review from a team as code owners January 27, 2026 23:35

daiyaanarfeen added 2 commits January 27, 2026 15:37

ci: add GlobalPlanner paths to filters.yaml

16b6270

Signed-off-by: Daiyaan <darfeen@nvidia.com>

fix: add copyright headers to scale protocol files

4716924

Signed-off-by: Daiyaan <darfeen@nvidia.com>

github-actions bot added the ci Issues/PRs that reference CI build/test label Jan 27, 2026

ci: exclude planner paths from core filter to avoid triggering builds

21f26b6

Signed-off-by: Daiyaan <darfeen@nvidia.com>

github-actions bot added ci Issues/PRs that reference CI build/test and removed ci Issues/PRs that reference CI build/test labels Jan 27, 2026

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

fix: convert sub_component_type to enum and remove await from sync me…

a78440e

…thod Signed-off-by: Daiyaan <darfeen@nvidia.com>

github-actions bot added ci Issues/PRs that reference CI build/test and removed ci Issues/PRs that reference CI build/test labels Jan 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GlobalPlanner component for centralized scaling #5702

feat: add GlobalPlanner component for centralized scaling #5702

daiyaanarfeen commented Jan 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add GlobalPlanner component for centralized scaling #5702

Are you sure you want to change the base?

feat: add GlobalPlanner component for centralized scaling #5702

Conversation

daiyaanarfeen commented Jan 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Jan 27, 2026

Uh oh!

coderabbitai bot commented Jan 27, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daiyaanarfeen commented Jan 27, 2026 •

edited by coderabbitai bot

Loading