Skip to content

Pre-serialize Celery task results to avoid JSON serialization errors when dispatched async #2188

@berendt

Description

@berendt

Summary

Fix follow-up to #2185 / commit 4c28494. Several Celery tasks in osism.tasks.netbox and osism.tasks.openstack return raw pynetbox Record/RecordSet objects or OpenStack SDK Resource objects. These types are not serializable by Celery's default JSON result backend. Calling any of them via .delay() / .apply_async() / .si() surfaces the same Object of type <X> is not JSON serializable failure that we already saw for osism sonic list.

The tasks currently do not fail only because every existing caller invokes them as plain in-process function calls (no wire transport, no serialization). Any future migration to async dispatch — for example from the FastAPI layer, the listener, or a new CLI command — will trigger the bug.

This issue tracks converting these tasks to return plain JSON-serializable dicts/lists, so they are safe to dispatch async, matching the pattern used by conductor.get_sonic_devices after commit 4c28494.

Motivation

  • Prevent regressions: the current structural issue is a latent bug waiting to surface the next time someone writes netbox.get_device_by_name.delay(...) from an API handler.
  • Consistency: after 4c28494 the SONiC listing task returns serialization-safe dicts, but peer tasks in the same package still return raw ORM-like objects. This is confusing for new contributors.
  • API readiness: osism.api is adding more endpoints that delegate to Celery workers. Returning plain dicts from those tasks is a prerequisite.

Scope

Affected tasks — osism/tasks/netbox.py

Line Task Current return
363 get_devices(**query) nb.dcim.devices.filter(**query) (RecordSet)
368 get_device_by_name(name) nb.dcim.devices.get(name=name) (Record or None)
373 get_interfaces_by_device(device_name) nb.dcim.interfaces.filter(...) (RecordSet)
378 get_addresses_by_device_and_interface(device_name, interface_name) nb.dcim.addresses.filter(...) (RecordSet)

Affected tasks — osism/tasks/openstack.py

Line Task Current return
24 image_get(image_name) conn.image.find_image(...) (Resource)
31 network_get(network_name) conn.network.find_network(...) (Resource)
38 baremetal_node_create(node_name, attributes=None) Node Resource
48 baremetal_node_delete(node_or_id) Node Resource
55 baremetal_node_update(node_id_or_name, attributes=None) Node Resource
64 baremetal_node_show(node_id_or_name, ignore_missing=False) Node Resource
71 baremetal_node_list() list(conn.baremetal.nodes()) (list of Resources)
336 baremetal_node_validate(node_id_or_name) ValidationResult
343 baremetal_node_wait_for_nodes_provision_state(node_id_or_name, state) Node Resource
362 baremetal_node_set_provision_state(node, state) Node Resource
369 baremetal_node_set_power_state(node, state, wait=False, timeout=None) Node Resource
380 baremetal_port_list(details=False, attributes=None) list of Port Resources
389 baremetal_port_create(attributes=None) Port Resource

baremetal_node_set_boot_device (line 356) and baremetal_port_delete (line 398) already return None or a delete-acknowledgement and are likely fine — double-check during implementation.

Callers that need to be updated

Grep results showing in-process consumers that currently rely on the raw object:

  • osism/tasks/conductor/ironic.py (many call sites around lines 360–812) — uses .uuid, .id, ["uuid"], .role.slug, etc.
  • osism/tasks/conductor/config.py (lines 35, 51, 68, 82, 99) — uses result.id on image/network results.
  • osism/tasks/conductor/utils.py (line 157) — uses the returned Ironic node.
  • osism/commands/baremetal.py (line 882) — uses netbox.get_devices(...).

Each caller must switch from attribute access on the raw object to dict-key access on the serialized shape.

Proposed approach (Variant 1)

Mirror the pattern introduced in commit 4c28494 for conductor.get_sonic_devices:

  1. Serialization helpers. Add small helpers (e.g. _serialize_device, _serialize_interface, _serialize_ip_address in osism/tasks/netbox.py; _serialize_node, _serialize_port, _serialize_image, _serialize_network in osism/tasks/openstack.py). Each helper:
    • accesses fields defensively with getattr(..., default) / .get(...) so a single malformed record does not break the whole list,
    • returns only primitive types (str, int, bool, None, list, dict),
    • includes all fields today's callers rely on (see caller audit above).
  2. Task bodies. Replace the raw return with return _serialize_X(result) (or a list comprehension for *_list / filter tasks).
  3. Error signalling. Adopt the same convention as get_sonic_devices: raise RuntimeError with a descriptive message for error conditions (not found, wrong role, backend failure) instead of returning None/sentinel values. Empty results return [] (lists) or None (single-object lookups, only when ignore_missing=True).
  4. Drop bind=True where self is unused. Several of these tasks declare self only to satisfy the decorator contract. If no self.request.* access is needed, remove bind=True and the self parameter. (This mirrors the get_devices cleanup in 4c28494 where the stray self was silently swallowing the positional argument.)
  5. Update every caller. For each migrated task, switch in-process consumers from attribute access to dict-key access. Keep attribute-style field names in the serialized dict (e.g. uuid, provision_state, power_state) to minimize churn.
  6. Tests. Add unit tests that import each task and assert the return value is JSON-serializable (json.dumps(result) must not raise). This guards against regressions and makes the contract explicit.

Non-goals

  • Changing the network topology of Celery queues / routing.
  • Switching result_serializer away from JSON (pickle would mask the real issue and introduce security risk).
  • Adding new functionality to any of these tasks. This is a targeted refactor.
  • Rewriting the callers to use async dispatch — they can stay synchronous, but they should not break when the task output becomes a dict.

Checklist

  • netbox.get_devices — serialize + update caller in conductor/ironic.py:758 and commands/baremetal.py:882
  • netbox.get_device_by_name — serialize (no external callers today; still fix the shape)
  • netbox.get_interfaces_by_device — serialize + update conductor/ironic.py:825
  • netbox.get_addresses_by_device_and_interface — serialize
  • openstack.image_get — serialize + update conductor/config.py:35,51,68
  • openstack.network_get — serialize + update conductor/config.py:82,99
  • openstack.baremetal_node_create — serialize + update conductor/ironic.py:370
  • openstack.baremetal_node_delete — serialize + update conductor/ironic.py:812
  • openstack.baremetal_node_update — serialize + update conductor/ironic.py:396,513,538,564
  • openstack.baremetal_node_show — serialize + update conductor/ironic.py:360,596, conductor/utils.py:157
  • openstack.baremetal_node_list — serialize + update conductor/ironic.py:773,1071
  • openstack.baremetal_node_validate — serialize + update conductor/ironic.py:436
  • openstack.baremetal_node_wait_for_nodes_provision_state — serialize + update all callers in conductor/ironic.py
  • openstack.baremetal_node_set_provision_state — serialize + update all callers
  • openstack.baremetal_node_set_power_state — serialize + update conductor/ironic.py:461
  • openstack.baremetal_port_list — serialize + update conductor/ironic.py:406,656,808
  • openstack.baremetal_port_create — serialize + update conductor/ironic.py:422
  • Verify baremetal_node_set_boot_device and baremetal_port_delete return JSON-safe values
  • Add regression tests asserting json.dumps(task_result) succeeds for each task
  • Optional: drop bind=True + self where unused

Acceptance criteria

  • Every task listed above returns only JSON-serializable data.
  • json.dumps(task(...)) succeeds for every task in a unit test (mocked NetBox / Ironic fixtures).
  • All existing in-process callers continue to work (CI green, manual smoke test of osism baremetal *, osism netbox *, osism sync ironic).
  • No change in behavior observable to CLI users; only the shape of internal task return values changes.

Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions