Skip to content

[feature] Firmware metadata extraction: extractor hierarchy, fwtool + DTB pipeline, and pre-upload validation #418

Description

@atif09

Is your feature request related to a problem? Please describe.

There is no shared error vocabulary, no defined interface contract for metadata extraction, no concrete implementation against actual firmware files, and no front validation on firmware uploads. Without these foundations, the coming phases of the extraction pipeline have no strong base to build on. Firmware images that carry no fwtool JSON trailer (e.g. sunxi targets) are silently unhandled, and there is no resource guard preventing a malformed image from expanding into an arbitrarily large buffer during decompression, leaving Celery workers exposed to OOM kills.

Describe the solution you'd like

1. Exception Hierarchy

ExtractionError as the base, with UnsupportedImageError (nothing extractable found, triggers fallback or clean failure) and DecompressionLimitExceeded as subclasses. These live in extractors/exceptions.py, separate from the upgrade-related exceptions in exceptions.py.

2. BaseMetadataExtractor ABC

A public extract() orchestrator that calls extract_from_image() and falls back to extract_from_dtb() on ExtractionError, re-raising UnsupportedImageError immediately if raised by extract_from_image() (no fallback attempted). extract_from_image() is abstract. extract_from_dtb() raises UnsupportedImageError by default, making DTB support opt-in. The base class stays technology-agnostic, no subprocess calls, no binary imports. Defines the normalized return dict as the binding contract for all downstream code:

{
    "model":          str,
    "compatible":     list[str],
    "target":         str,
    "version":        str,
    "compat_version": str,
    "source":         str,   # "fwtool" | "dtb"
}

3. metadata_extractor_class seam on the upgrader hierarchy

None on the base upgrader, OpenWrtMetadataExtractor on OpenWrt. This isolates firmware family logic, adding a new family requires only subclassing BaseMetadataExtractor; the task and admin layers are never touched.

4. OpenWrtMetadataExtractor: fwtool fast path

  • _run_command(): subprocess wrapper enforcing a strict 30-second timeout, decoding stdout with errors="replace" to handle non-UTF-8 binary garbage, raising ExtractionError on non-zero exit codes or TimeoutExpired.
  • _extract_from_fwtool(): invokes fwtool -q -i - <image_path>, parses the JSON trailer, and returns the normalized dict. All fields accessed via .get() to tolerate schema variations across OpenWrt versions 18.06–24.10. Raises ExtractionError on missing or malformed JSON so extract() can trigger the fallback cleanly.
  • _parse_supported_devices(): handles the compat_version difference introduced during the swconfig→DSA migration. If compat_version != "1.0", board identifiers are read from new_supported_devices instead of supported_devices.
  • _detect_image_type(): called before fwtool to identify types that will never have a fwtool trailer: x86 disk images (.img, .vdi, .vmdk suffix) and armsr targets raise UnsupportedImageError immediately.

5. OpenWrtMetadataExtractor: DTB fallback path and OOM protection

  • _check_limits(): validates raw file size against a configurable cap (OPENWISP_FIRMWARE_UPGRADER_MAX_KERNEL_BYTES, default 256 MB) and raises DecompressionLimitExceeded before any decompression begins. During gzip decompression, chunk-by-chunk reading enforces OPENWISP_FIRMWARE_UPGRADER_MAX_DECOMPRESSED_BYTES (default 512 MB) and OPENWISP_FIRMWARE_UPGRADER_MAX_DECOMPRESSED_RATIO (default 100×) to catch compression bombs before they expand.
  • extract_from_dtb(): calls _check_limits() first, then attempts kernel decompression sequentially across gzip, xz, lzma, bz2, and lz4, stopping at the first that succeeds. Strips uImage headers before decompression. Locates the DTB within the decompressed kernel (FIT image scan or raw magic search), parses it with fdt, and returns the normalized dict with "source": "dtb". Raises UnsupportedImageError if no compression format succeeds or no DTB is found.
  • extract() override: runs fwtool first; on success, optionally enriches an empty compatible list from the DTB path without changing "source". On ExtractionError, falls back to DTB. Re-raises UnsupportedImageError from either path.

6. Pre-upload validation on FirmwareImage via clean()

  • _validate_file_header(): reads the first 16 bytes and rejects known non-firmware magic bytes (JPEG \xff\xd8\xff, PDF %PDF, PNG \x89PNG, ZIP PK\x03\x04, ELF \x7fELF) with a translated ValidationError({"file": ...}). Fails gracefully on IOError and on missing file.
  • _validate_rootfs(): rejects filenames ending in -rootfs.img with a translated ValidationError({"file": ...}).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestgsoc-ideaIssues part of Google Summer of Code project
No fields configured for Feature.

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions