refactor(BA-5315): Flatten deployment sub-step enum and prepare deploying infrastructure#10355
Conversation
Merge DeploymentSubStatus (base) and DeploymentSubStep (concrete) into a single flat DeploymentLifecycleSubStep enum with DEPLOYING_ prefixed members. This eliminates the need for resolve_sub_step/sub_steps_for helper methods and enables direct DeploymentLifecycleSubStep(str_value) conversion at the event handler boundary. Also applies StrEnumType on the EndpointRow.sub_step column so the ORM handles str↔enum conversion automatically, and renames the sub_status field to sub_step throughout DeploymentLifecycleStatus for consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors deployment lifecycle sub-step handling by replacing the previous sub-status/sub-step enum hierarchy with a single flat DeploymentLifecycleSubStep enum and updating the Sokovan deployment lifecycle flow, ORM mappings, and related call sites accordingly.
Changes:
- Replace
DeploymentSubStatus/DeploymentSubStepwithDeploymentLifecycleSubStepand renameDeploymentLifecycleStatus.sub_status→sub_step. - Update coordinator/handlers/strategy logic to use the new flat sub-step enum (including completed/provisioning/rolling-back states).
- Update ORM column typing to
StrEnumType(DeploymentLifecycleSubStep)for automatic enum ↔ string conversion.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/manager/sokovan/deployment/strategy/test_applier.py | Updates unit tests to use DeploymentLifecycleSubStep.DEPLOYING_* members. |
| src/ai/backend/manager/sokovan/deployment/strategy/types.py | Updates strategy result/summary typing to the new sub-step enum. |
| src/ai/backend/manager/sokovan/deployment/strategy/applier.py | Updates completion detection to match the new DEPLOYING_COMPLETED sub-step. |
| src/ai/backend/manager/sokovan/deployment/handlers/deploying.py | Updates deploying handlers to use sub_step and DeploymentLifecycleSubStep.DEPLOYING_*. |
| src/ai/backend/manager/sokovan/deployment/handlers/base.py | Updates handler docs to reference DeploymentLifecycleSubStep. |
| src/ai/backend/manager/sokovan/deployment/deployment_controller.py | Updates mark_lifecycle_needed() signature to accept DeploymentLifecycleSubStep. |
| src/ai/backend/manager/sokovan/deployment/coordinator.py | Refactors handler keys/specs and transitions to use the new sub-step enum. |
| src/ai/backend/manager/services/deployment/service.py | Updates lifecycle mark trigger to use the new DEPLOYING_PROVISIONING sub-step. |
| src/ai/backend/manager/repositories/deployment/repository.py | Updates repository API typing for sub-step filtering. |
| src/ai/backend/manager/repositories/deployment/db_source/db_source.py | Updates DB source typing for sub-step filtering in queries. |
| src/ai/backend/manager/repositories/deployment/creators/deployment.py | Updates batch updater spec typing from DeploymentSubStatus to DeploymentLifecycleSubStep. |
| src/ai/backend/manager/models/endpoint/row.py | Switches EndpointRow.sub_step to StrEnumType(DeploymentLifecycleSubStep). |
| src/ai/backend/manager/event_dispatcher/handlers/schedule.py | Updates schedule event handling to parse DeploymentLifecycleSubStep values. |
| src/ai/backend/manager/data/deployment/types.py | Introduces DeploymentLifecycleSubStep and renames DeploymentLifecycleStatus.sub_status → sub_step. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def create_if_needed_event(self) -> DoDeploymentLifecycleIfNeededEvent: | ||
| """Create event for checking if processing is needed.""" | ||
| return DoDeploymentLifecycleIfNeededEvent( | ||
| self.lifecycle_type.value, sub_step=self._sub_step_value() | ||
| ) | ||
| return DoDeploymentLifecycleIfNeededEvent(self.lifecycle_type.value, sub_step=self.sub_step) | ||
|
|
There was a problem hiding this comment.
DoDeploymentLifecycleIfNeededEvent is defined with sub_step: str | None, but this now passes a DeploymentLifecycleSubStep enum instance. Because the MQ msgpack wrapper uses strict_types=True, enums are serialized via the custom ENUM ext type (pickle) rather than as plain strings, which can break backward/rolling compatibility between manager versions. Pass sub_step.value (or str(sub_step)) into the event instead of the enum object.
| def create_process_event(self) -> DoDeploymentLifecycleEvent: | ||
| """Create event for forced processing.""" | ||
| return DoDeploymentLifecycleEvent( | ||
| self.lifecycle_type.value, sub_step=self._sub_step_value() | ||
| ) | ||
| return DoDeploymentLifecycleEvent(self.lifecycle_type.value, sub_step=self.sub_step) |
There was a problem hiding this comment.
Same issue as create_if_needed_event: DoDeploymentLifecycleEvent expects sub_step as a plain str | None, but passing the enum will be msgpack-serialized as a pickled ENUM ext type under strict_types=True. Please pass sub_step.value/str(sub_step) here to keep event payloads stable across versions.
| # Deploying — one task per sub-step | ||
| for sub_step in self._registry.sub_steps_for(DeploymentLifecycleType.DEPLOYING): | ||
| for sub_step in ( | ||
| DeploymentLifecycleSubStep.DEPLOYING_PROVISIONING, | ||
| DeploymentLifecycleSubStep.DEPLOYING_ROLLING_BACK, | ||
| ): |
There was a problem hiding this comment.
_create_task_specs() now hard-codes the DEPLOYING sub-steps. This can silently drift from the handler registry if a new DEPLOYING handler/sub-step is added later (the handler would never be scheduled). Consider deriving this list from self._registry.handlers (e.g., collect sub_steps for DeploymentLifecycleType.DEPLOYING) so scheduling stays in sync with the registered handlers.
| @@ -34,7 +34,7 @@ class StrategyCycleResult: | |||
| ``sub_step`` indicates the next state: PROVISIONING or COMPLETED. | |||
There was a problem hiding this comment.
The StrategyCycleResult docstring still refers to PROVISIONING/COMPLETED member names, but the enum was renamed/flattened to DeploymentLifecycleSubStep.DEPLOYING_*. Updating this docstring would avoid confusion for readers and keep it consistent with the new enum.
| ``sub_step`` indicates the next state: PROVISIONING or COMPLETED. | |
| ``sub_step`` indicates the next deployment lifecycle sub-step | |
| as a :class:`DeploymentLifecycleSubStep` value (for example, a ``DEPLOYING_*`` state). |
…ep field - Extract ModelRevisionSpec construction from _to_deployment_info_legacy into build_revision_spec_from_endpoint helper method - Add sub_step field to ModelDeploymentData and wire through service/adapter - Add sub_step to REST API DeploymentDTO response - Add RouteStatus.is_provisioning() helper - Fix truthiness check for sub_step (use `is not None` instead of bool) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rolling update PR Move non-rolling-update-evaluator changes to the base refactoring PR: - Coordinator: sub_step filtering, expired transition for skipped deployments - Deploying handlers: expired transition, rolling_back post_process - Executor: route creation refactoring - Repository/DB source: sub_steps filter parameter - Strategy applier: remove clear_deploying_revision (moved to repo) - Strategy types: docstring updates - BEP-1049 proposal updates - Test fixtures updates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntLifecycleSubStep Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…deployments The executor now falls back to get_revision_spec_from_endpoint when current_revision_id is None, instead of skipping the deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sses/failures/skipped) Replace the 4-field result (successes, errors, skipped, need_retry) with the 3-field pattern used by session coordinator: successes, failures, skipped. Handlers now report all non-success outcomes as failures (DeploymentExecutionError). The coordinator classifies failures into need_retry/expired/give_up based on retry count and timeout policy, matching the session side's approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When current_revision is NULL (e.g., newly created deployments before activate_revision), route provisioning failed with DeploymentHasNoTargetRevision. Add fetch_deployment_context_from_endpoint that resolves image from endpoint-level fields instead of a revision record. The route executor now falls back to this method when no revision ID is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When revision_id is None (no revision activated yet), fall back to the first entry in deployment_info.model_revisions which is populated from endpoint-level fields by build_revision_spec_from_endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d optional" This reverts commit ee522c0.
Change resolve_revision_spec to raise DeploymentRevisionNotFound instead of returning None, removing unnecessary None checks at call sites. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves BA-5315.
Summary
Preparatory refactoring for the Rolling Update deployment strategy (BA-3435). Separates infrastructure changes from the pure FSM evaluator logic.
Enum unification
DeploymentSubStatus(base) /DeploymentSubStep(concrete subclass) into a single flatDeploymentLifecycleSubStepenum withDEPLOYING_prefixed membersStrEnumTypeonEndpointRow.sub_stepcolumn for automatic ORM str↔enum conversionsub_statusfield tosub_stepthroughoutDeploymentLifecycleStatusresolve_sub_step,sub_steps_for,_sub_step_valuehelper methodsDeploying handler infrastructure
expiredtransition toDeployingProvisioningHandler(timeout → ROLLING_BACK)get_deployments_for_handler, expired transition for skipped deploymentsDeployingRollingBackHandler: acceptDeploymentRepositorydirectly instead ofStrategyResultApplierclear_deploying_revisionfromStrategyResultApplier(moved to repository)Executor & repository
build_revision_spec_from_endpoint)sub_stepsfilter parameter toget_deployments_for_handler/fetch_deployments_for_handlerAPI & data
sub_stepfield onModelDeploymentDataand REST APIDeploymentDTORouteStatus.is_provisioning()helpersub_step(is not Noneinstead of bool)Documentation
DB stored values (
"provisioning","rolling_back","completed") are unchanged — no migration required.Test plan
pants lintpassespants checkpasses (no new type errors)test_applier.pypasses🤖 Generated with Claude Code