Skip to content

[feature:gsoc26] Update batch status aggregation, cancellation and concurrent-upgrade guard to handle pending operations #423

Description

@Eeshu-Yadav

Is your feature request related to a problem? Please describe.

When sub-issue 01 lands and pending becomes a real status, six places in the codebase silently break because they all assume "anything that isn't in-progress is finished". I'd need to fix each of them in lockstep — otherwise a batch with pending children would mis-report as success, the progress display would say "5/5 complete" while 5 devices are still waiting, pending ops would be uncancellable, and a fresh batch could enqueue a duplicate upgrade for a device that already has one pending.

Here's where each assumption shows up today:

  • AbstractBatchUpgradeOperation.calculate_and_update_status() counts pending ops as completed because its Case/When aggregate excludes only in-progress. A batch with 0 in-progress and 5 pending children would fall through to the success branch if any other op succeeded.
  • progress_report does the same in a separate query, so the batch would display "5 out of 5 completed" while 5 devices are still waiting.
  • cancel() reads _CANCELLABLE_STATUS = "in-progress" (a string), so pending operations can't be cancelled.
  • The concurrent-upgrade guard inside upgrade() filters only status="in-progress", so a new batch could enqueue a second upgrade for a device that already has a pending one queued from a previous batch.
  • BatchUpgradeProgressConsumer._handle_current_state_request at websockets.py:254 hardcodes op.status != "in-progress" as the completed count for connection-time snapshots, so opening the batch page would show wrong counts.
  • DeviceUpgradeProgressConsumer._handle_current_state_request at websockets.py:328–334 hardcodes a status whitelist that doesn't include pending, so pending ops would be invisible in device history.

Describe the solution I would implement

I would like to walk each of these sites and update them to treat pending as a non-terminal status, plus add a property and tests so callers and reviewers can verify the fix.

  1. Update calculate_and_update_status() (base/models.py:719–791):

    • Add a new pending aggregate counter alongside the existing in_progress / completed / successful / failed / cancelled / aborted counters.
    • Change the completed aggregate to exclude both in-progress AND pending.
    • In the status decision block, keep the batch as in-progress whenever there are any in-progress OR pending children.
  2. Add a pending_count property on AbstractBatchUpgradeOperation, sitting next to the existing total_operations property at base/models.py:649–651: return self.upgrade_operations.filter(status="pending").count(). Used by admin templates and any non-publisher consumer that wants the count without calling calculate_and_update_status().

  3. Update the progress_report property (base/models.py:653–656): change .exclude(status="in-progress") to exclude both in-progress AND pending. This has to move in lockstep with bullet 1's aggregate.

  4. Change _CANCELLABLE_STATUS (base/models.py:796) from the string "in-progress" to the tuple ("in-progress", "pending"). Update cancel() (base/models.py:861–894):

    • Replace status=self._CANCELLABLE_STATUS with status__in=self._CANCELLABLE_STATUS in the atomic filter().update().
    • Replace self.status != self._CANCELLABLE_STATUS with self.status not in self._CANCELLABLE_STATUS in the error-path check.
    • Leave the existing progress__lt=CANCELLATION_THRESHOLD guard alone — pending ops have progress=PROGRESS_MIN, so they always pass it.
  5. Extend the concurrent-upgrade guard in upgrade() (base/models.py:944–949) from status="in-progress" to status__in=("in-progress", "pending"). The existing branch that aborts the new op with status="aborted" stays as-is.

  6. WebSocket changes — the push path (per-op post_save → publisher → frontend) auto-flows once bullet 1's model fix lands (because update_batch_status reads stats["completed"] from calculate_and_update_status(), which now excludes pending). But the pull path — fired on every page load when batch-upgrade-progress.js / upgrade-progress.js send request_current_state to the server — bypasses the model method and reimplements its own count/filter. Two snapshot handlers must be patched for correctness:

    • BatchUpgradeProgressConsumer._handle_current_state_request at websockets.py:254 hardcodes op.status != "in-progress" for the snapshot completed count. Change to not in ("in-progress", "pending"). Without this, opening a batch page with 2 pending + 3 success children renders "5 out of 5" on first paint.
    • DeviceUpgradeProgressConsumer._handle_current_state_request at websockets.py:328–334 hardcodes a status whitelist that omits "pending". Add it, or pending ops vanish silently from device history.
  7. Extend the per-batch payload so the live UI can show "X complete, Y pending" — the proposal explicitly commits to this display ("the admin now sees the batch as in-progress with a progress report that correctly shows '15 complete, 5 pending'"). Snapshot fixes alone keep counts correct but leave the existing "X out of Y" wording, which doesn't surface the pending state:

    • BatchUpgradeProgressPublisher.update_batch_status() at websockets.py:509–517 passes stats["pending"] through to publish_batch_status.
    • BatchUpgradeProgressPublisher.publish_batch_status() at websockets.py:499–507 signature gains a pending parameter; payload gains a "pending" key.
    • batch-upgrade-progress.js's updateBatchProgress renders the new "X complete, Y pending" format from the payload.
    • progress_report (base/models.py:653–656) returns the same "X complete, Y pending" string, so the static admin template render on page load matches the live WebSocket-driven update.
  8. A WebSocket integration test confirming a pending transition publishes correctly and that connection-time snapshots from both consumers return the expected counts.

  9. Unit tests covering:

    • Aggregation correctness for mixed pending/success/in-progress batches (batch stays in-progress, progress excludes pending).
    • progress_report matches the aggregate's view of "completed".
    • pending_count returns the correct value for mixed batches.
    • Cancel succeeds on a pending op (status becomes cancelled).
    • Cancel still works on in-progress ops below the progress threshold (regression).
    • Cancel on a terminal-status op raises ValueError.
    • Concurrent-upgrade guard rejects a new upgrade attempt against a device with a pending op (status becomes aborted).
    • BatchUpgradeProgressConsumer snapshot returns completed = N − pending_count (regression for the consumer-side fix).
    • DeviceUpgradeProgressConsumer snapshot includes a pending op in the history (regression for the whitelist fix).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestgsoc-ideaIssues part of Google Summer of Code project

Type

No fields configured for Task.

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions