Skip to content

Commit a6f27f0

Browse files
committed
[docs] Add Persistent Mass Upgrades page #379
Related to #379
1 parent 106fecd commit a6f27f0

3 files changed

Lines changed: 120 additions & 2 deletions

File tree

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ within the OpenWISP architecture.
3939
./user/intro.rst
4040
./user/quickstart.rst
4141
./user/upgrade-status.rst
42+
./user/persistent-mass-upgrades.rst
4243
./user/automatic-device-firmware-detection.rst
4344
./user/custom-firmware-upgrader.rst
4445
./user/rest-api.rst
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
Persistent Mass Upgrades
2+
========================
3+
4+
When a mass upgrade runs against a large fleet, some devices are usually
5+
offline at that moment. Without persistence, each unreachable device ends
6+
as ``failed`` once the immediate retries are exhausted, leaving the
7+
operator to track down and re-launch every failed device by hand.
8+
9+
A *persistent* mass upgrade does not give up on offline devices. Instead
10+
of marking them ``failed``, it parks them in the ``pending`` state with a
11+
scheduled retry time and keeps retrying in the background until the device
12+
comes back online or the operation is cancelled.
13+
14+
.. contents:: **Table of contents**:
15+
:depth: 2
16+
:local:
17+
18+
How it works
19+
------------
20+
21+
An operation whose device is unreachable transitions to ``pending``
22+
instead of ``failed``, with an incremented ``retry_count`` and an
23+
exponential-backoff ``next_retry_at``. A periodic Celery Beat task
24+
re-dispatches pending operations once their retry time has elapsed, and
25+
the batch stays ``in-progress`` until every device has either upgraded or
26+
been cancelled.
27+
28+
.. image:: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-mass-upgrade-batch.png
29+
:target: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-mass-upgrade-batch.png
30+
31+
The mass-upgrade page above stays ``in progress`` while one device is
32+
still ``pending``, reporting ``2 complete, 1 pending`` and keeping the
33+
batch open until the offline device is retried successfully or cancelled.
34+
35+
See :doc:`upgrade-status` for the full operation state machine and the
36+
meaning of the ``pending`` state.
37+
38+
Enabling from the admin
39+
-----------------------
40+
41+
On the mass-upgrade confirmation page (reached from a build's *Upgrade*
42+
action) the **persistent** checkbox is shown pre-checked. Leave it checked
43+
to keep retrying offline devices, or uncheck it to fall back to the
44+
behaviour where unreachable devices end as ``failed``.
45+
46+
.. image:: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-mass-upgrade-confirm.png
47+
:target: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-mass-upgrade-confirm.png
48+
49+
The flag is locked in once the mass upgrade leaves the ``idle`` state, so
50+
it cannot be changed midway through a running batch.
51+
52+
Enabling via the REST API
53+
~~~~~~~~~~~~~~~~~~~~~~~~~
54+
55+
The mass-upgrade endpoint accepts an ``is_persistent`` field that defaults
56+
to ``true``; the single-device upgrade endpoint accepts the same field but
57+
defaults to ``false``. See :doc:`rest-api` for the full request and
58+
response reference.
59+
60+
Finding pending operations
61+
--------------------------
62+
63+
Pending operations are listed in the upgrade-operation admin and can be
64+
isolated with the ``status`` filter set to ``pending``. The list shows the
65+
``persistent`` flag and the ``retry_count`` column, the latter being how
66+
many times an operation has been retried so far.
67+
68+
.. image:: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-upgrade-pending-changelist.png
69+
:target: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-upgrade-pending-changelist.png
70+
71+
An operation's detail page adds ``next_retry_at`` (when the next attempt
72+
is scheduled) and a log that records each attempt, ending with the
73+
backoff-scheduled ``persistent retry`` line for the next run.
74+
75+
.. image:: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-upgrade-operation-pending.png
76+
:target: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-upgrade-operation-pending.png
77+
78+
Cancelling a pending operation
79+
------------------------------
80+
81+
A pending operation is still active, so it can be cancelled the same way
82+
as an in-progress one — from the admin cancel button or the REST cancel
83+
endpoint. Cancelling stops the retry loop and moves the operation to
84+
``cancelled``. A pending operation cannot be *deleted* until it reaches a
85+
terminal state (see :ref:`deleting_upgrade_operations`).
86+
87+
Notifications
88+
-------------
89+
90+
Two notifications keep operators informed about long-running persistent
91+
upgrades:
92+
93+
- a **reminder** fires when a persistent batch still has pending children
94+
after the configured cadence has elapsed, and
95+
- a **failure** notification fires when a persistent operation finally
96+
ends as ``failed`` (for example, the device was deactivated while
97+
pending).
98+
99+
.. image:: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-upgrade-notifications.png
100+
:target: https://raw.githubusercontent.com/openwisp/openwisp-firmware-upgrader/docs/docs/images/1.4/persistent-upgrade-notifications.png
101+
102+
The cadence and related settings are documented in :doc:`settings`.
103+
104+
Behaviour with and without openwisp-monitoring
105+
----------------------------------------------
106+
107+
Persistent upgrades work with Celery Beat alone: the periodic scan retries
108+
due pending operations on a fixed cadence. Installing
109+
``openwisp-monitoring`` adds a faster wake-up path — a device returning to
110+
a healthy state triggers its pending retries immediately, without waiting
111+
for the next scan. When ``openwisp-monitoring`` is not installed, the Beat
112+
scan remains the only retry trigger.
113+
114+
The periodic tasks (``check_pending_upgrades`` and
115+
``send_pending_upgrade_reminders``) must be present in the deployment's
116+
``CELERY_BEAT_SCHEDULE``; see :doc:`settings`.

docs/user/upgrade-status.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,14 +136,15 @@ before completion. This is a deliberate action taken through the admin
136136
interface or REST API.
137137

138138
Users can cancel upgrades through the admin interface using the "Cancel"
139-
button that appears next to in-progress operations.
139+
button that appears next to in-progress and pending operations.
140140

141141
**When cancellation is possible:**
142142

143143
- During the early stages of upgrade (typically before 65% progress)
144144
- Before the new firmware image is written to the flash memory of the
145145
network device
146-
- While the operation status is still "in-progress"
146+
- While the operation status is still ``in-progress`` or ``pending`` (a
147+
pending operation can be cancelled to stop its persistent retries)
147148

148149
**What happens when the upgrade operation is cancelled:**
149150

0 commit comments

Comments
 (0)