-
Notifications
You must be signed in to change notification settings - Fork 822
feat: add GlobalPlanner component for centralized scaling #5702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
daiyaanarfeen
wants to merge
7
commits into
main
Choose a base branch
from
darfeen/global-planner
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,188
−39
Open
Changes from 3 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
0ea00e8
feat: add GlobalPlanner component for centralized scaling
daiyaanarfeen bb98036
refactor: move planner arg validation to planner_core for better cohe…
daiyaanarfeen adae932
test: remove redundant tests covered by integration tests
daiyaanarfeen 16b6270
ci: add GlobalPlanner paths to filters.yaml
daiyaanarfeen 4716924
fix: add copyright headers to scale protocol files
daiyaanarfeen 21f26b6
ci: exclude planner paths from core filter to avoid triggering builds
daiyaanarfeen a78440e
fix: convert sub_component_type to enum and remove await from sync me…
daiyaanarfeen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| """ | ||
| GlobalPlanner - Centralized scaling execution service. | ||
| The GlobalPlanner is a standalone component that receives scale requests from | ||
| Planners and executes them via the Kubernetes API. It provides centralized | ||
| scaling management across multiple deployments and namespaces. | ||
| Architecture: | ||
| - Planners make scaling decisions (observe, predict, decide) | ||
| - Planners in delegating mode send requests to GlobalPlanner | ||
| - GlobalPlanner executes scaling via Kubernetes API | ||
| - GlobalPlanner is stateless and can scale horizontally | ||
| Usage: | ||
| python -m dynamo.global_planner \ | ||
| --namespace=global-infra \ | ||
| --managed-namespaces app-ns-1 app-ns-2 | ||
| """ | ||
|
|
||
| __all__ = [ | ||
| "ScaleRequestHandler", | ||
| ] | ||
|
|
||
| from dynamo.global_planner.scale_handler import ScaleRequestHandler |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| """ | ||
| GlobalPlanner - Centralized Scaling Execution Service | ||
|
|
||
| Entry point for the GlobalPlanner component. | ||
|
|
||
| Usage: | ||
| python -m dynamo.global_planner --namespace=global-infra | ||
|
|
||
| With authorization: | ||
| python -m dynamo.global_planner \\ | ||
| --namespace=global-infra \\ | ||
| --managed-namespaces app-ns-1 app-ns-2 | ||
| """ | ||
|
|
||
| import asyncio | ||
| import logging | ||
| import os | ||
|
|
||
| from pydantic import BaseModel | ||
|
|
||
| from dynamo.global_planner.argparse_config import ( | ||
| create_global_planner_parser, | ||
| validate_args, | ||
| ) | ||
| from dynamo.global_planner.scale_handler import ScaleRequestHandler | ||
| from dynamo.runtime import DistributedRuntime, dynamo_worker | ||
| from dynamo.runtime.logging import configure_dynamo_logging | ||
|
|
||
| configure_dynamo_logging() | ||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class HealthCheckRequest(BaseModel): | ||
| """Request type for health check endpoint""" | ||
|
|
||
| text: str = "ping" | ||
|
|
||
|
|
||
| @dynamo_worker() | ||
| async def main(runtime: DistributedRuntime, args): | ||
| """Initialize and run GlobalPlanner. | ||
|
|
||
| The GlobalPlanner is a centralized scaling service that: | ||
| 1. Listens for scale requests from Planners | ||
| 2. Validates caller authorization (optional) | ||
| 3. Executes scaling via Kubernetes API | ||
| 4. Returns success/failure status | ||
|
|
||
| Args: | ||
| runtime: Dynamo runtime instance | ||
| args: Parsed command-line arguments | ||
| """ | ||
| # Validate arguments | ||
| validate_args(args) | ||
|
|
||
| logger.info("=" * 60) | ||
| logger.info("Starting GlobalPlanner") | ||
| logger.info("=" * 60) | ||
| logger.info(f"Namespace: {args.namespace}") | ||
| logger.info(f"Environment: {args.environment}") | ||
|
|
||
| if args.managed_namespaces: | ||
| logger.info("Authorization: ENABLED") | ||
| logger.info(f"Authorized namespaces: {args.managed_namespaces}") | ||
| else: | ||
| logger.info("Authorization: DISABLED (accepting all namespaces)") | ||
|
|
||
| logger.info("=" * 60) | ||
|
|
||
| # Create the GlobalPlanner component | ||
| component = runtime.namespace(args.namespace).component("GlobalPlanner") | ||
|
|
||
| # Get K8s namespace (where GlobalPlanner pod is running) | ||
| k8s_namespace = os.environ.get("POD_NAMESPACE", "default") | ||
| logger.info(f"Running in Kubernetes namespace: {k8s_namespace}") | ||
|
|
||
| # Create scale request handler | ||
| handler = ScaleRequestHandler( | ||
| runtime=runtime, | ||
| managed_namespaces=args.managed_namespaces, | ||
| k8s_namespace=k8s_namespace, | ||
| ) | ||
|
|
||
| # Serve scale_request endpoint | ||
| logger.info("Serving endpoints...") | ||
| scale_endpoint = component.endpoint("scale_request") | ||
| await scale_endpoint.serve_endpoint(handler.scale_request) | ||
| logger.info(" ✓ scale_request - Receives scaling requests from Planners") | ||
|
|
||
| # Serve health check endpoint | ||
| async def health_check(request: HealthCheckRequest): | ||
| """Health check endpoint for monitoring""" | ||
| yield { | ||
| "status": "healthy", | ||
| "component": "GlobalPlanner", | ||
| "namespace": args.namespace, | ||
| "managed_namespaces": args.managed_namespaces or "all", | ||
| } | ||
|
|
||
| health_endpoint = component.endpoint("health") | ||
| await health_endpoint.serve_endpoint(health_check) | ||
| logger.info(" ✓ health - Health check endpoint") | ||
|
|
||
| logger.info("=" * 60) | ||
| logger.info("GlobalPlanner is ready and waiting for scale requests") | ||
| logger.info("=" * 60) | ||
|
|
||
| # Keep running forever (process scale requests as they come) | ||
| await asyncio.Event().wait() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| parser = create_global_planner_parser() | ||
| args = parser.parse_args() | ||
| asyncio.run(main(args)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| """Argument parsing for GlobalPlanner.""" | ||
|
|
||
| import argparse | ||
|
|
||
|
|
||
| def create_global_planner_parser() -> argparse.ArgumentParser: | ||
| """Create and configure the argument parser for GlobalPlanner. | ||
|
|
||
| Returns: | ||
| argparse.ArgumentParser: Configured argument parser for GlobalPlanner | ||
| """ | ||
| parser = argparse.ArgumentParser( | ||
| description="GlobalPlanner - Centralized Scaling Execution Service", | ||
| formatter_class=argparse.RawDescriptionHelpFormatter, | ||
| epilog=""" | ||
| Examples: | ||
| # Simple deployment (accept all namespaces) | ||
| python -m dynamo.global_planner --namespace=global-infra | ||
|
|
||
| # With authorization | ||
| python -m dynamo.global_planner \\ | ||
| --namespace=global-infra \\ | ||
| --managed-namespaces app-ns-1 app-ns-2 app-ns-3 | ||
|
|
||
| # Custom environment | ||
| python -m dynamo.global_planner \\ | ||
| --namespace=global-infra \\ | ||
| --environment=kubernetes | ||
| """, | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "--namespace", | ||
| required=True, | ||
| help="Dynamo namespace for GlobalPlanner component", | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "--managed-namespaces", | ||
| type=str, | ||
| nargs="+", | ||
| default=None, | ||
| help="Optional: List of namespaces authorized to use this GlobalPlanner (default: accept all)", | ||
| ) | ||
|
|
||
| parser.add_argument( | ||
| "--environment", | ||
| default="kubernetes", | ||
| choices=["kubernetes"], | ||
| help="Environment type (currently only kubernetes supported)", | ||
| ) | ||
|
|
||
| return parser | ||
|
|
||
|
|
||
| def validate_args(args): | ||
| """Validate GlobalPlanner arguments. | ||
|
|
||
| Args: | ||
| args: Parsed arguments from argparse | ||
|
|
||
| Raises: | ||
| ValueError: If arguments are invalid | ||
| """ | ||
| # managed_namespaces is optional - if not specified, accept all | ||
| if args.managed_namespaces and len(args.managed_namespaces) == 0: | ||
| raise ValueError( | ||
| "--managed-namespaces must have at least one namespace if specified" | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,132 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| """Handler for scale_request endpoint in GlobalPlanner.""" | ||
|
|
||
| import logging | ||
|
|
||
| from dynamo.planner import KubernetesConnector, TargetReplica | ||
| from dynamo.planner.scale_protocol import ScaleRequest, ScaleResponse | ||
| from dynamo.runtime import DistributedRuntime, dynamo_endpoint | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class ScaleRequestHandler: | ||
| """Handles incoming scale requests in GlobalPlanner. | ||
| This handler: | ||
| 1. Receives scale requests from Planners | ||
| 2. Validates caller authorization (optional) | ||
| 3. Caches KubernetesConnector per DGD for efficiency | ||
| 4. Executes scaling via Kubernetes API | ||
| 5. Returns current replica counts | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, runtime: DistributedRuntime, managed_namespaces: list, k8s_namespace: str | ||
| ): | ||
| """Initialize the scale request handler. | ||
| Args: | ||
| runtime: Dynamo runtime instance | ||
| managed_namespaces: List of authorized namespaces (None = accept all) | ||
| k8s_namespace: Kubernetes namespace where GlobalPlanner is running | ||
| """ | ||
| self.runtime = runtime | ||
| # If managed_namespaces is None, accept all namespaces | ||
| self.managed_namespaces = ( | ||
| set(managed_namespaces) if managed_namespaces else None | ||
| ) | ||
| self.k8s_namespace = k8s_namespace | ||
| self.connectors = {} # Cache of KubernetesConnector per DGD | ||
|
|
||
| if self.managed_namespaces: | ||
| logger.info( | ||
| f"ScaleRequestHandler initialized for namespaces: {managed_namespaces}" | ||
| ) | ||
| else: | ||
| logger.info("ScaleRequestHandler initialized (accepting all namespaces)") | ||
|
|
||
| @dynamo_endpoint(ScaleRequest, ScaleResponse) | ||
| async def scale_request(self, request: ScaleRequest): | ||
| """Process scaling request from a Planner. | ||
| Args: | ||
| request: ScaleRequest with target replicas and DGD info | ||
| Yields: | ||
| ScaleResponse with status and current replica counts | ||
| """ | ||
| try: | ||
| # Validate caller namespace (if authorization is enabled) | ||
| if ( | ||
| self.managed_namespaces is not None | ||
| and request.caller_namespace not in self.managed_namespaces | ||
| ): | ||
| yield { | ||
| "status": "error", | ||
| "message": f"Namespace {request.caller_namespace} not authorized", | ||
| "current_replicas": {}, | ||
| } | ||
| return | ||
|
|
||
| logger.info( | ||
| f"Processing scale request from {request.caller_namespace} " | ||
| f"for DGD {request.graph_deployment_name} " | ||
| f"in K8s namespace {request.k8s_namespace}" | ||
| ) | ||
|
|
||
| # Get or create connector for this DGD | ||
| connector_key = f"{request.k8s_namespace}/{request.graph_deployment_name}" | ||
| if connector_key not in self.connectors: | ||
| connector = KubernetesConnector( | ||
| dynamo_namespace=request.caller_namespace, | ||
| model_name="managed", # Not used for remote execution | ||
| k8s_namespace=request.k8s_namespace, | ||
| parent_dgd_name=request.graph_deployment_name, | ||
| ) | ||
| await connector._async_init() | ||
| self.connectors[connector_key] = connector | ||
| logger.debug(f"Created new connector for {connector_key}") | ||
| else: | ||
| connector = self.connectors[connector_key] | ||
| logger.debug(f"Reusing cached connector for {connector_key}") | ||
|
|
||
| # Convert request replicas to TargetReplica objects | ||
| target_replicas = [ | ||
| TargetReplica( | ||
| sub_component_type=r.sub_component_type, | ||
| component_name=r.component_name, | ||
| desired_replicas=r.desired_replicas, | ||
| ) | ||
| for r in request.target_replicas | ||
| ] | ||
|
|
||
| # Execute scaling | ||
| await connector.set_component_replicas( | ||
| target_replicas, blocking=request.blocking | ||
| ) | ||
|
|
||
| # Get current replica counts | ||
| current_replicas = {} | ||
| deployment = await connector.kube_api.get_graph_deployment( | ||
| connector.parent_dgd_name | ||
| ) | ||
daiyaanarfeen marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| for service_name, service_spec in deployment["spec"]["services"].items(): | ||
| sub_type = service_spec.get("subComponentType", "") | ||
| if sub_type: | ||
| current_replicas[sub_type] = service_spec.get("replicas", 0) | ||
|
|
||
| logger.info( | ||
| f"Successfully scaled {request.graph_deployment_name}: {current_replicas}" | ||
| ) | ||
| yield { | ||
| "status": "success", | ||
| "message": f"Scaled {request.graph_deployment_name} successfully", | ||
| "current_replicas": current_replicas, | ||
| } | ||
|
|
||
| except Exception as e: | ||
| logger.exception(f"Error processing scale request: {e}") | ||
| yield {"status": "error", "message": str(e), "current_replicas": {}} | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.