Skip to content

refactor(BA-5315): Flatten deployment sub-step enum and prepare deploying infrastructure#10355

Merged
HyeockJinKim merged 12 commits into
mainfrom
refactor/flatten-deployment-lifecycle-sub-step
Mar 23, 2026
Merged

refactor(BA-5315): Flatten deployment sub-step enum and prepare deploying infrastructure#10355
HyeockJinKim merged 12 commits into
mainfrom
refactor/flatten-deployment-lifecycle-sub-step

Conversation

@jopemachine
Copy link
Copy Markdown
Member

@jopemachine jopemachine commented Mar 20, 2026

Resolves BA-5315.

Summary

Preparatory refactoring for the Rolling Update deployment strategy (BA-3435). Separates infrastructure changes from the pure FSM evaluator logic.

Enum unification

  • Merge DeploymentSubStatus (base) / DeploymentSubStep (concrete subclass) into a single flat DeploymentLifecycleSubStep enum with DEPLOYING_ prefixed members
  • Apply StrEnumType on EndpointRow.sub_step column for automatic ORM str↔enum conversion
  • Rename sub_status field to sub_step throughout DeploymentLifecycleStatus
  • Remove resolve_sub_step, sub_steps_for, _sub_step_value helper methods

Deploying handler infrastructure

  • Add expired transition to DeployingProvisioningHandler (timeout → ROLLING_BACK)
  • Coordinator: sub_step filtering for get_deployments_for_handler, expired transition for skipped deployments
  • DeployingRollingBackHandler: accept DeploymentRepository directly instead of StrategyResultApplier
  • Remove clear_deploying_revision from StrategyResultApplier (moved to repository)

Executor & repository

  • Refactor route creation in executor (extract build_revision_spec_from_endpoint)
  • Add sub_steps filter parameter to get_deployments_for_handler / fetch_deployments_for_handler

API & data

  • Expose sub_step field on ModelDeploymentData and REST API DeploymentDTO
  • Add RouteStatus.is_provisioning() helper
  • Fix truthiness check for sub_step (is not None instead of bool)

Documentation

  • Update BEP-1049 proposal and rolling-update design doc

DB stored values ("provisioning", "rolling_back", "completed") are unchanged — no migration required.

Test plan

  • pants lint passes
  • pants check passes (no new type errors)
  • test_applier.py passes
  • CI green

🤖 Generated with Claude Code

Merge DeploymentSubStatus (base) and DeploymentSubStep (concrete) into
a single flat DeploymentLifecycleSubStep enum with DEPLOYING_ prefixed
members. This eliminates the need for resolve_sub_step/sub_steps_for
helper methods and enables direct DeploymentLifecycleSubStep(str_value)
conversion at the event handler boundary.

Also applies StrEnumType on the EndpointRow.sub_step column so the ORM
handles str↔enum conversion automatically, and renames the sub_status
field to sub_step throughout DeploymentLifecycleStatus for consistency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 20, 2026 06:46
@github-actions github-actions Bot added the size:L 100~500 LoC label Mar 20, 2026
@github-actions github-actions Bot added the comp:manager Related to Manager component label Mar 20, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors deployment lifecycle sub-step handling by replacing the previous sub-status/sub-step enum hierarchy with a single flat DeploymentLifecycleSubStep enum and updating the Sokovan deployment lifecycle flow, ORM mappings, and related call sites accordingly.

Changes:

  • Replace DeploymentSubStatus/DeploymentSubStep with DeploymentLifecycleSubStep and rename DeploymentLifecycleStatus.sub_statussub_step.
  • Update coordinator/handlers/strategy logic to use the new flat sub-step enum (including completed/provisioning/rolling-back states).
  • Update ORM column typing to StrEnumType(DeploymentLifecycleSubStep) for automatic enum ↔ string conversion.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/manager/sokovan/deployment/strategy/test_applier.py Updates unit tests to use DeploymentLifecycleSubStep.DEPLOYING_* members.
src/ai/backend/manager/sokovan/deployment/strategy/types.py Updates strategy result/summary typing to the new sub-step enum.
src/ai/backend/manager/sokovan/deployment/strategy/applier.py Updates completion detection to match the new DEPLOYING_COMPLETED sub-step.
src/ai/backend/manager/sokovan/deployment/handlers/deploying.py Updates deploying handlers to use sub_step and DeploymentLifecycleSubStep.DEPLOYING_*.
src/ai/backend/manager/sokovan/deployment/handlers/base.py Updates handler docs to reference DeploymentLifecycleSubStep.
src/ai/backend/manager/sokovan/deployment/deployment_controller.py Updates mark_lifecycle_needed() signature to accept DeploymentLifecycleSubStep.
src/ai/backend/manager/sokovan/deployment/coordinator.py Refactors handler keys/specs and transitions to use the new sub-step enum.
src/ai/backend/manager/services/deployment/service.py Updates lifecycle mark trigger to use the new DEPLOYING_PROVISIONING sub-step.
src/ai/backend/manager/repositories/deployment/repository.py Updates repository API typing for sub-step filtering.
src/ai/backend/manager/repositories/deployment/db_source/db_source.py Updates DB source typing for sub-step filtering in queries.
src/ai/backend/manager/repositories/deployment/creators/deployment.py Updates batch updater spec typing from DeploymentSubStatus to DeploymentLifecycleSubStep.
src/ai/backend/manager/models/endpoint/row.py Switches EndpointRow.sub_step to StrEnumType(DeploymentLifecycleSubStep).
src/ai/backend/manager/event_dispatcher/handlers/schedule.py Updates schedule event handling to parse DeploymentLifecycleSubStep values.
src/ai/backend/manager/data/deployment/types.py Introduces DeploymentLifecycleSubStep and renames DeploymentLifecycleStatus.sub_statussub_step.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 164 to 167
def create_if_needed_event(self) -> DoDeploymentLifecycleIfNeededEvent:
"""Create event for checking if processing is needed."""
return DoDeploymentLifecycleIfNeededEvent(
self.lifecycle_type.value, sub_step=self._sub_step_value()
)
return DoDeploymentLifecycleIfNeededEvent(self.lifecycle_type.value, sub_step=self.sub_step)

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DoDeploymentLifecycleIfNeededEvent is defined with sub_step: str | None, but this now passes a DeploymentLifecycleSubStep enum instance. Because the MQ msgpack wrapper uses strict_types=True, enums are serialized via the custom ENUM ext type (pickle) rather than as plain strings, which can break backward/rolling compatibility between manager versions. Pass sub_step.value (or str(sub_step)) into the event instead of the enum object.

Copilot uses AI. Check for mistakes.
Comment on lines 168 to +170
def create_process_event(self) -> DoDeploymentLifecycleEvent:
"""Create event for forced processing."""
return DoDeploymentLifecycleEvent(
self.lifecycle_type.value, sub_step=self._sub_step_value()
)
return DoDeploymentLifecycleEvent(self.lifecycle_type.value, sub_step=self.sub_step)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as create_if_needed_event: DoDeploymentLifecycleEvent expects sub_step as a plain str | None, but passing the enum will be msgpack-serialized as a pickled ENUM ext type under strict_types=True. Please pass sub_step.value/str(sub_step) here to keep event payloads stable across versions.

Copilot uses AI. Check for mistakes.
Comment on lines +758 to +762
# Deploying — one task per sub-step
for sub_step in self._registry.sub_steps_for(DeploymentLifecycleType.DEPLOYING):
for sub_step in (
DeploymentLifecycleSubStep.DEPLOYING_PROVISIONING,
DeploymentLifecycleSubStep.DEPLOYING_ROLLING_BACK,
):
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_create_task_specs() now hard-codes the DEPLOYING sub-steps. This can silently drift from the handler registry if a new DEPLOYING handler/sub-step is added later (the handler would never be scheduled). Consider deriving this list from self._registry.handlers (e.g., collect sub_steps for DeploymentLifecycleType.DEPLOYING) so scheduling stays in sync with the registered handlers.

Copilot uses AI. Check for mistakes.
@@ -34,7 +34,7 @@ class StrategyCycleResult:
``sub_step`` indicates the next state: PROVISIONING or COMPLETED.
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StrategyCycleResult docstring still refers to PROVISIONING/COMPLETED member names, but the enum was renamed/flattened to DeploymentLifecycleSubStep.DEPLOYING_*. Updating this docstring would avoid confusion for readers and keep it consistent with the new enum.

Suggested change
``sub_step`` indicates the next state: PROVISIONING or COMPLETED.
``sub_step`` indicates the next deployment lifecycle sub-step
as a :class:`DeploymentLifecycleSubStep` value (for example, a ``DEPLOYING_*`` state).

Copilot uses AI. Check for mistakes.
…ep field

- Extract ModelRevisionSpec construction from _to_deployment_info_legacy
  into build_revision_spec_from_endpoint helper method
- Add sub_step field to ModelDeploymentData and wire through service/adapter
- Add sub_step to REST API DeploymentDTO response
- Add RouteStatus.is_provisioning() helper
- Fix truthiness check for sub_step (use `is not None` instead of bool)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the comp:common Related to Common component label Mar 20, 2026
…rolling update PR

Move non-rolling-update-evaluator changes to the base refactoring PR:
- Coordinator: sub_step filtering, expired transition for skipped deployments
- Deploying handlers: expired transition, rolling_back post_process
- Executor: route creation refactoring
- Repository/DB source: sub_steps filter parameter
- Strategy applier: remove clear_deploying_revision (moved to repo)
- Strategy types: docstring updates
- BEP-1049 proposal updates
- Test fixtures updates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size:XL 500~ LoC and removed size:L 100~500 LoC labels Mar 20, 2026
@jopemachine jopemachine changed the title refactor: Flatten deployment lifecycle sub-step enum hierarchy refactor(BA-3435): Flatten deployment sub-step enum and prepare deploying infrastructure Mar 20, 2026
@jopemachine jopemachine added this to the 26.4 milestone Mar 20, 2026
@jopemachine jopemachine changed the title refactor(BA-3435): Flatten deployment sub-step enum and prepare deploying infrastructure refactor(BA-5315): Flatten deployment sub-step enum and prepare deploying infrastructure Mar 20, 2026
jopemachine and others added 2 commits March 20, 2026 16:28
…ntLifecycleSubStep

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine requested review from a team and HyeockJinKim March 20, 2026 07:32
Comment thread src/ai/backend/manager/sokovan/deployment/coordinator.py
jopemachine and others added 2 commits March 20, 2026 16:46
…deployments

The executor now falls back to get_revision_spec_from_endpoint when
current_revision_id is None, instead of skipping the deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sses/failures/skipped)

Replace the 4-field result (successes, errors, skipped, need_retry) with
the 3-field pattern used by session coordinator: successes, failures, skipped.

Handlers now report all non-success outcomes as failures (DeploymentExecutionError).
The coordinator classifies failures into need_retry/expired/give_up based on
retry count and timeout policy, matching the session side's approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jopemachine and others added 5 commits March 20, 2026 18:32
When current_revision is NULL (e.g., newly created deployments before
activate_revision), route provisioning failed with DeploymentHasNoTargetRevision.

Add fetch_deployment_context_from_endpoint that resolves image from
endpoint-level fields instead of a revision record. The route executor
now falls back to this method when no revision ID is available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When revision_id is None (no revision activated yet), fall back to the
first entry in deployment_info.model_revisions which is populated from
endpoint-level fields by build_revision_spec_from_endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change resolve_revision_spec to raise DeploymentRevisionNotFound instead of
returning None, removing unnecessary None checks at call sites.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@HyeockJinKim HyeockJinKim enabled auto-merge (squash) March 23, 2026 01:43
@HyeockJinKim HyeockJinKim merged commit 9bcd5c0 into main Mar 23, 2026
33 checks passed
@HyeockJinKim HyeockJinKim deleted the refactor/flatten-deployment-lifecycle-sub-step branch March 23, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:common Related to Common component comp:manager Related to Manager component size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants