Skip to content

feat(BA-5280): Consolidate deploying handlers and remove unused sub-steps#10276

Merged
jopemachine merged 12 commits into
mainfrom
refactor/applier-remove-substep-writes
Mar 19, 2026
Merged

feat(BA-5280): Consolidate deploying handlers and remove unused sub-steps#10276
jopemachine merged 12 commits into
mainfrom
refactor/applier-remove-substep-writes

Conversation

@jopemachine
Copy link
Copy Markdown
Member

@jopemachine jopemachine commented Mar 17, 2026

Resolves BA-5280.

Summary

  • Remove sub_step writes from strategy applier: _update_sub_steps removed from apply_strategy_mutations. Sub-step transitions are now handled exclusively by the coordinator's status transition mechanism.
  • Replace rewind with need_retry: Route mutations (create/drain) within the same sub-step (PROVISIONING → PROVISIONING) now use the need_retry transition. Unlike error-based need_retry, these are never escalated to give_up since they represent normal progress.
  • Remove allow_merge logic: History merging is now always attempted based on should_merge_with() — no per-record opt-out needed.
  • Consolidate deploying handlers: Remove DeployingProgressingHandlerDeployingProvisioningHandler handles the entire DEPLOYING lifecycle. Remove unused PROGRESSING and ROLLED_BACK sub-steps.
  • Simplify RollingBackHandler: Transitions directly to READY instead of going through ROLLED_BACK intermediate state.
  • Extract has_mutations() method: Strategy apply skip-condition moved to StrategyApplyResult.has_mutations() for clarity.

Test plan

  • Applier tests updated
  • Coordinator history tests updated (rewind → need_retry)
  • Rolling update FSM tests pass
  • E2E completed test verified
  • E2E rollback (timeout) test verified

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 17, 2026 05:59
@jopemachine jopemachine added the type:refactor Refactor codes or add tests. label Mar 17, 2026
@github-actions github-actions Bot added size:L 100~500 LoC comp:manager Related to Manager component labels Mar 17, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors strategy application so that sub-step transitions (and rolled-back cleanup) are no longer written as side-effects by the strategy applier, aligning responsibilities with the coordinator/rollback handler in preparation for the rolling update strategy (BEP-1049).

Changes:

  • Removed assignments/rolled_back_ids-driven DB writes from apply_strategy_mutations() across applier/repository/db_source.
  • Added a dedicated clear_deploying_revision() API on applier/repository/db_source to clear deploying_revision + sub_step for rolled-back deployments.
  • Updated unit tests to assert that sub-step-only summaries (and rolled-back-only summaries) skip apply_strategy_mutations() DB calls.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
tests/unit/manager/sokovan/deployment/strategy/test_applier.py Updates tests to reflect that sub-step-only and rolled-back-only summaries should not trigger strategy DB mutations.
src/ai/backend/manager/sokovan/deployment/strategy/applier.py Stops passing assignments/rolled-back IDs into DB mutations; adds clear_deploying_revision() delegation method.
src/ai/backend/manager/repositories/deployment/repository.py Narrows apply_strategy_mutations() to route + revision swap; adds repository-level clear_deploying_revision().
src/ai/backend/manager/repositories/deployment/db_source/db_source.py Removes _update_sub_steps() / rolled-back cleanup from transaction; adds a standalone clear_deploying_revision() transaction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +190 to 193
await applier.apply(summary)

kwargs = mock_deployment_repo.apply_strategy_mutations.call_args.kwargs
assert kwargs["drain"] is None
) -> None:
"""Rolled-back IDs are tracked in result but not passed to DB mutations.

The RollingBackHandler is responsible for clearing deploying_revision.
Comment on lines 47 to 50
@@ -48,7 +48,10 @@ class StrategyResultApplier:
1. Sub-step assignment updates
2. Route rollout (create) and drain (terminate)
3. Revision swap for COMPLETED deployments
Comment on lines 52 to 55
Note: Clearing ``deploying_revision`` for rolled-back deployments is NOT
done here — it is the responsibility of ``DeployingRollingBackHandler``
to explicitly call ``clear_deploying_revision`` after rollback completes.
"""
Comment on lines 1407 to 1411
async def apply_strategy_mutations(
self,
assignments: Mapping[UUID, DeploymentSubStep],
rollout: BulkCreator[RoutingRow],
drain: BatchUpdater[RoutingRow] | None,
completed_ids: set[UUID],
Comment on lines +1428 to +1432
async def clear_deploying_revision(self, deployment_ids: set[UUID]) -> None:
"""Clear deploying_revision and sub_step for rolled-back deployments.

Called explicitly by ``DeployingRollingBackHandler`` after rollback
completes — NOT automatically during strategy mutations.
if not rolled_back_ids:
"""Clear deploying_revision and sub_step for rolled-back deployments.

This is called explicitly by the RollingBackHandler after rollback
@jopemachine jopemachine marked this pull request as draft March 17, 2026 06:06
@jopemachine jopemachine changed the title refactor: Remove sub_step writes from strategy applier refactor: Consolidate sub_step control in coordinator and add rewind transition Mar 17, 2026
@jopemachine jopemachine changed the title refactor: Consolidate sub_step control in coordinator and add rewind transition feat: Move sub_step transitions to coordinator with rewind support Mar 17, 2026
@jopemachine jopemachine changed the title feat: Move sub_step transitions to coordinator with rewind support feat(BA-5280): Move sub_step transitions to coordinator with rewind support Mar 17, 2026
@jopemachine jopemachine added this to the 26.4 milestone Mar 17, 2026
@jopemachine jopemachine requested a review from a team March 17, 2026 06:27
@jopemachine jopemachine marked this pull request as ready for review March 17, 2026 06:27
@jopemachine jopemachine marked this pull request as draft March 17, 2026 06:40
@github-actions github-actions Bot added size:XL 500~ LoC and removed size:L 100~500 LoC labels Mar 17, 2026
@jopemachine jopemachine changed the title feat(BA-5280): Move sub_step transitions to coordinator with rewind support feat: Consolidate deploying handlers and move sub_step control to coordinator Mar 17, 2026
@jopemachine jopemachine changed the title feat: Consolidate deploying handlers and move sub_step control to coordinator feat(BA-5278): Move sub_step transitions to coordinator with rewind support Mar 17, 2026
@jopemachine jopemachine changed the title feat(BA-5278): Move sub_step transitions to coordinator with rewind support feat(BA-5280): Move sub_step transitions to coordinator with rewind support Mar 17, 2026
@jopemachine jopemachine marked this pull request as ready for review March 17, 2026 09:42
Comment thread src/ai/backend/manager/data/deployment/types.py Outdated
Comment thread src/ai/backend/manager/repositories/deployment/db_source/db_source.py Outdated
Comment thread src/ai/backend/manager/repositories/base/types.py Outdated
Comment thread src/ai/backend/manager/sokovan/deployment/strategy/applier.py Outdated
@jopemachine jopemachine changed the title feat(BA-5280): Move sub_step transitions to coordinator with rewind support feat(BA-5280): Consolidate deploying handlers and remove unused sub-steps Mar 18, 2026
@HyeockJinKim
Copy link
Copy Markdown
Collaborator

please rebase this PR. @jopemachine

HyeockJinKim
HyeockJinKim previously approved these changes Mar 19, 2026
Copy link
Copy Markdown
Collaborator

@HyeockJinKim HyeockJinKim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved this PR. but you should rebase this and re-mention.

jopemachine and others added 12 commits March 19, 2026 14:48
Remove assignments parameter and _update_sub_steps calls from
apply_strategy_mutations so that sub_step transitions are handled
exclusively by the coordinator's status transition mechanism.
Extract clear_deploying_revision as a separate public method on both
the repository and db_source layers. Update tests to reflect that
assignments-only summaries no longer trigger DB mutations.

This is a prerequisite refactoring for the rolling update deployment
strategy (BEP-1049).
Add rewind as a new transition category in DeploymentStatusTransitions
and DeploymentExecutionResult. Unlike need_retry, rewind does not
increment phase_attempts — it represents normal forward progress
(e.g. progressing → provisioning for the next batch in rolling update).

Also add or_conditions/and_conditions utilities for combining
QueryConditions, useful for building complex route filters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove DeployingProgressingHandler — ProvisioningHandler handles
  the entire DEPLOYING lifecycle
- Remove PROGRESSING and ROLLED_BACK from DeploymentSubStep enum
- Simplify RollingBackHandler to transition directly to READY
- Remove rolled_back_ids from StrategyApplyResult
- Add allow_merge flag to DeploymentHistoryCreatorSpec so rewind
  history records are never merged (each route mutation cycle is
  visible as a separate record)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the refactor/applier-remove-substep-writes branch from f539c10 to 3ae0da9 Compare March 19, 2026 05:58
@jopemachine jopemachine enabled auto-merge (squash) March 19, 2026 07:31
@jopemachine jopemachine merged commit e903f88 into main Mar 19, 2026
30 checks passed
@jopemachine jopemachine deleted the refactor/applier-remove-substep-writes branch March 19, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:manager Related to Manager component size:XL 500~ LoC type:refactor Refactor codes or add tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants