[feature:gsoc26] Add health_status_changed signal handler for fast pending-upgrade wake-up via OpenWISP Monitoring

**Is your feature request related to a problem? Please describe.**

Sub-issue 04's 10-minute Beat scan guarantees pending ops eventually get retried, but the latency once a device comes online can stretch to 15+ minutes on an early retry (5-min scan jitter + half the scan cadence + the next backoff window). For an operator watching a deployment in real time, that's a frustrating gap.

The faster wake-up signal is openwisp-monitoring's `health_status_changed`, which fires the moment a device transitions back to `ok`. Two things make this tricky:

- openwisp-monitoring is *not* a dependency of the firmware upgrader, so the integration has to stay optional - deployments without monitoring should still get persistence via Beat alone.
- A burst can hurt: when a network outage recovers and 200 devices flip from `critical → ok` in the same second, naively connecting a handler would fire 200 retries at once and saturate the broker. Needs jitter.

**Describe the solution I would implement**

I would like to add an optional signal-based wake-up path that complements sub-issue 04's Beat scan without becoming a hard dependency on openwisp-monitoring.

1. Add a `connect_monitoring_signals()` method to `FirmwareUpdaterConfig` and call it from `ready()`. Wrap the `from openwisp_monitoring.device.signals import health_status_changed` import in `try/except ImportError` - if monitoring isn't installed, the connection silently no-ops and the rest of `ready()` finishes normally. Sub-issue 04's Beat-driven path keeps working either way.

2. Implement the handler in a new `signals_handlers.py` (or extend `signals.py`). Signal signature is verified at `openwisp_monitoring/device/signals.py:3` and emitted from `openwisp_monitoring/device/base/models.py:377`: `health_status_changed.send(sender, instance, status)`. The handler reacts only when `status == "ok"` and ignores `critical`, `unknown`, `problem`, and `deactivated`. Lateral `ok → ok` re-emissions could trigger duplicate dispatches, but sub-issue 04's atomic compare-and-swap absorbs them (see bullet 5).

3. The signal's `instance` is the `DeviceMonitoring` row that owns the health status; its related `Device` is `instance.device` (OneToOneField). For each pending op on that device, dispatch `retry_pending_upgrade` from sub-issue 04 with a randomized countdown:

   ```python
   pending_pks = UpgradeOperation.objects.filter(
       device=instance.device, status="pending"
   ).values_list("pk", flat=True)
   for pk in pending_pks:
       countdown = random.uniform(0, PERSISTENT_RETRY_SIGNAL_JITTER)
       retry_pending_upgrade.apply_async(args=[pk], countdown=countdown)
   ```

4. One configurable setting for the signal-driven dispatch jitter:

   | Setting | Default | Purpose |
   |---|---|---|
   | `..._PERSISTENT_RETRY_SIGNAL_JITTER` | 120 (2 min) | Smaller than sub-issue 04's 5-min Beat jitter because signal wake-up is meant to feel fast |

5. Idempotency comes for free from sub-issue 04: both the signal handler and the Beat scan call `retry_pending_upgrade`, which uses the atomic `filter(status="pending").update(status="in-progress")` compare-and-swap. If both fire for the same op in the same minute, only one worker's update returns nonzero; the other exits silently. That directly handles the edge case where the signal triggers a wake-up while the op is seemingly already woken up - ignore and do nothing.

6. Testing approach: since openwisp-monitoring isn't installed in the firmware upgrader's CI, I'd construct a mock `django.dispatch.Signal()` locally with the same `(sender, instance, status)` kwargs and call the handler directly. Tests cover: `status="ok"` with a matching pending op dispatches one retry with countdown in `[0, jitter]`; non-recovery statuses dispatch nothing; no matching pending op dispatches nothing; `connect_monitoring_signals` silently no-ops when the import fails; signal + Beat dispatched concurrently for the same op result in exactly one `upgrade_firmware.delay` call.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature:gsoc26] Add health_status_changed signal handler for fast pending-upgrade wake-up via OpenWISP Monitoring #425

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[feature:gsoc26] Add health_status_changed signal handler for fast pending-upgrade wake-up via OpenWISP Monitoring #425

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions