Skip to content

The EMBA book ‐ Chapter 5: SBOM and vulnerability aggregation

Michael Messner edited this page Sep 1, 2025 · 4 revisions

Chapter 5: Software Bill of Materials (SBOM) & Vulnerability Aggregation

In our journey through EMBA so far, we've seen how it meticulously extracts all the hidden parts from a firmware image (Chapter 1: Firmware Extraction Layer). Then, we learned how it thoroughly scrutinizes these extracted files without running them (Chapter 2: The Analysis Core). Finally, we explored how EMBA virtually "boots up" the firmware to observe its live behavior and uncover runtime issues (Chapter 3: Dynamic Analysis with user-mode Emulation and Chapter 4: Booting up the system via System emulation).

By now, EMBA has gathered a mountain of data: a list of every file found, software versions detected, potentially weak configurations, and more. But what do you do with all this raw information? How do you get a clear, organized picture of what's inside the firmware and what security risks it might face?

This is precisely where EMBA's Software Bill of Materials (SBOM) & Vulnerability Aggregation layer comes into play! Imagine you've finished baking a complicated cake. You have all the ingredients laid out, you've mixed and baked them, and now you want to present a clear "recipe card" listing everything that went into it, along with any potential "allergy warnings" for those ingredients. This chapter will explain how EMBA creates such a detailed "recipe card" for your firmware, highlighting any known security "allergies."

Our central goal in this chapter is to understand how EMBA takes all the fragmented information it has collected and organizes it into a comprehensive inventory of software components (an SBOM), then cross-references this inventory with vast databases of known security flaws to identify and present potential vulnerabilities.

What is a Software Bill of Materials (SBOM)?

At its core, a Software Bill of Materials (SBOM) is like an "ingredients list" for software. Just as a food product lists all its components (flour, sugar, eggs), an SBOM lists every single software component, library, and module that makes up a piece of software.

Why is this important for firmware?

  • Transparency: It tells you exactly what software is inside your device, even if the manufacturer doesn't publicly disclose it.
  • Security Posture: If you know the ingredients, you can check for known problems with those ingredients.
  • Compliance: Many regulations and industry standards are now requiring SBOMs for better supply chain security.

An SBOM typically includes:

Component Type Analogy Example in Firmware
Name Ingredient Name busybox
Version Quantity/Specific Type 1.30.1
Supplier Manufacturer busybox.net
License Allergen Information GPL-2.0
Dependencies Sub-ingredients busybox might depend on glibc
Hashes Unique Barcode MD5, SHA256 of the binary
Paths Where it's found in the package /bin/busybox, /usr/local/bin/busybox

Having an SBOM is crucial for supply chain security. If a vulnerability is discovered in OpenSSL version 1.0.2, you can quickly check your SBOMs to see if any of your devices use that specific version.

What is Vulnerability Aggregation?

Once we have our "ingredients list" (the SBOM), the next crucial step is to check each ingredient for known security flaws. This process is called Vulnerability Aggregation. It's like taking your cake's ingredient list and cross-referencing it with a massive database of food allergies or recalls.

Key concepts in vulnerability aggregation:

  • CVEs (Common Vulnerabilities and Exposures): These are unique identifiers for publicly disclosed cybersecurity vulnerabilities (e.g., CVE-2023-12345). Each CVE describes a specific flaw in a specific piece of software.
  • Exploit Databases: These are collections of publicly available "proof-of-concept" code or actual exploit tools that demonstrate how to take advantage of a vulnerability (e.g., Exploit-DB, Metasploit). EMBA checks these to see if an identified CVE has readily available attack code.
  • EPSS (Exploit Prediction Scoring System): A data-driven score that estimates the likelihood of a vulnerability being exploited in the wild.
  • Verification: This is EMBA's special sauce! Instead of just saying "this version has CVE-X," EMBA tries to verify if the vulnerable feature or code path is actually present in the specific binary found in the firmware. This significantly reduces false positives.
  • VEX (Vulnerability Exploitability eXchange): A VEX document goes a step further than just listing vulnerabilities. It provides a clear statement about whether a vulnerability identified in a component actually affects the product or system in question. For example, a library might have a CVE, but if the firmware uses it in a way that avoids the vulnerable function, the VEX can state that the vulnerability is "not affected."

How EMBA Automates SBOM & Vulnerability Aggregation

As a user, you don't need special commands for this phase. Once you run EMBA, after the extraction, static, and dynamic analysis steps are complete, EMBA automatically proceeds to build the SBOM and aggregate vulnerabilities. It's an integrated part of the overall analysis pipeline.

You'll see output similar to this:

[*] SBOM - main package SBOM environment
[+] Debian packages SBOM results
[*] Analyzing 10 Python pipfile.lock archives:
[+] Python pipfile.lock SBOM results
[*] Generating final VEX vulnerability json ...
[+] VEX data in json format is available
[+] CycloneDX SBOM with VEX data in JSON format is ready

This indicates EMBA is actively identifying packages (like Debian or Python Pipfile locks) to build the SBOM, and then generating VEX (Vulnerability Exploitability eXchange) data and a final CycloneDX SBOM.

Let's break down some key aspects of how EMBA builds the SBOM and aggregates vulnerability data:

1. Identifying Software Components (SBOM Building)

EMBA has many specialized "sub-modules" dedicated to finding software components from different sources. This is handled primarily by S08_main_package_sbom.sh and its sub-modules. It's like having different experts looking for different kinds of "ingredients" in your extracted firmware.

  • Package Managers: If the firmware uses a Linux distribution like Debian, OpenWrt, or Alpine, EMBA checks their package management databases (e.g., /var/lib/dpkg/status for Debian, /var/lib/opkg/info/*.control for OpenWrt, /lib/apk/db/installed for Alpine). These files list installed software, their versions, and sometimes even their dependencies and licenses.

    For example, to parse Alpine APK packages, EMBA might look into .PKGINFO files after extracting a .apk archive:

    # Simplified from modules/S08_main_package_sbom_modules/S08_submodule_alpine_apk_package_parser.sh
    # After extracting an .apk file to /tmp/apk
    lAPP_NAME=$(grep '^pkgname = ' "${TMP_DIR}"/apk/.PKGINFO || true)
    lAPP_NAME=${lAPP_NAME/pkgname\ =\ }
    # ... more parsing for version, license, etc.
    echo "Found Alpine package: Name=${lAPP_NAME}, Version=${lAPP_VERS}"

    This snippet shows how EMBA extracts details like the package name from a specific file (.PKGINFO) found within an Alpine .apk package.

  • Programming Language Lock/Requirement Files: EMBA also looks for files common in software development projects that list dependencies, such as Python's requirements.txt or Pipfile.lock, Node.js's package-lock.json, PHP's composer.lock, or Rust's Cargo.lock.

    Here's a simplified example of parsing a Python requirements.txt:

    # Simplified from modules/S08_main_package_sbom_modules/S08_submodule_python_requirements_parser.sh
    # Reads a line like "requests==2.28.1"
    if [[ "${lRES_ENTRY}" == *"=="* ]]; then
      lAPP_NAME=${lRES_ENTRY/==*}
      lAPP_VERS=${lRES_ENTRY/*==}
    fi
    echo "Python requirement: Name=${lAPP_NAME}, Version=${lAPP_VERS}"

    This demonstrates how EMBA extracts the package name and version from a line in a requirements.txt file.

  • Metadata in Binary Files: For Windows executables, EMBA can sometimes extract software names and versions from metadata embedded in the binary (like EXIF data in image files, but for executables).

    # Simplified from modules/S08_main_package_sbom_modules/S08_submodule_windows_exifparser.sh
    # Extracts product name and version from a Windows EXE using exiftool
    exiftool "${lEXE_ARCHIVE}" > "${lEXIF_LOG}"
    lAPP_NAME=$(grep "Product Name" "${lEXIF_LOG}" || true)
    lAPP_NAME=${lAPP_NAME/*:\ }
    lAPP_VERS=$(grep "Product Version Number" "${lEXIF_LOG}" || true)
    lAPP_VERS=${lAPP_VERS/*:\ }

    exiftool is a utility that can read various metadata, including that found in Windows executables.

  • Generic Version Detection: As seen in Chapter 2: Static Analysis Core, EMBA's S09_firmware_base_version_check.sh and S115_usermode_emulator.sh (user-mode emulation) directly identify software versions from binaries (like BusyBox or OpenSSL) by looking for specific strings or running the binary. These findings also contribute to the SBOM.

Once a component's details are extracted, helper functions (helpers/helpers_emba_sbom_helpers.sh) standardize the data and store it as individual JSON files in EMBA's internal SBOM_LOG_PATH directory.

# Simplified from helpers/helpers_emba_sbom_helpers.sh
# Function to build a JSON fragment for a component
build_sbom_json_component_arr() {
  local lPACKAGING_SYSTEM="${1:-}" # e.g., "debian_pkg_mgmt"
  local lAPP_NAME="${2:-}"        # e.g., "libc6"
  local lAPP_VERS="${3:-}"        # e.g., "2.31-0ubuntu9.9"
  # ... other parameters like license, maintainer, description

  # Construct a unique ID for this component
  SBOM_COMP_BOM_REF="$(uuidgen)"

  # Use 'jo' (JSON output) to create a JSON object
  jo -n -- \
    type="library" \
    name="${lAPP_NAME}" \
    version="${lAPP_VERS}" \
    group="${lPACKAGING_SYSTEM}" \
    bom-ref="${SBOM_COMP_BOM_REF}" \
    properties="$(jo -a "${PROPERTIES_JSON_ARR[@]}")" \
    hashes="$(jo -a "${HASHES_ARR[@]}")" \
    > "${SBOM_LOG_PATH}/${lPACKAGING_SYSTEM}_${lAPP_NAME}_${SBOM_COMP_BOM_REF}.json"
}

This function build_sbom_json_component_arr is central to creating the individual JSON entries for each identified software component. It takes the parsed details (name, version, packaging system) and organizes them into a structured format, also incorporating calculated file hashes and other "properties" collected during analysis.

After all individual components are identified, the F15_cyclonedx_sbom.sh module takes all these small JSON fragments and merges them into one comprehensive SBOM document, typically in the widely-used CycloneDX format.

# Simplified from modules/F15_cyclonedx_sbom.sh
# Aggregates all individual component JSONs into a single CycloneDX SBOM file
# ...
echo -n "[" > "${SBOM_LOG_PATH}/sbom_components_tmp.json"
for lCOMP_FILE in "${lCOMP_FILES_ARR[@]}"; do
  # Reads each component's JSON file and appends it to a temporary file
  cat "${lCOMP_FILE}" >> "${SBOM_LOG_PATH}/sbom_components_tmp.json"
  # Adds a comma if it's not the last entry
  echo -n "," >> "${SBOM_LOG_PATH}/sbom_components_tmp.json"
done
echo -n "]" >> "${SBOM_LOG_PATH}/sbom_components_tmp.json"
# ...
# Finally, constructs the full SBOM with metadata, components, and dependencies
jo -p -n -- \
  \$schema="http://cyclonedx.org/schema/bom-1.5.schema.json" \
  bomFormat="CycloneDX" \
  specVersion="1.5" \
  components=:"${lSBOM_LOG_FILE}_components.json" \
  dependencies=:"${lSBOM_LOG_FILE}_dependencies.json" \
  vulnerabilities="[]" \
  > "${lSBOM_LOG_FILE}.json"

This simplified code snippet shows the high-level process of F15_cyclonedx_sbom.sh: it gathers all the individual component JSONs (generated by S08 sub-modules) and combines them into a single, well-formed CycloneDX SBOM JSON file. It also sets up the basic SBOM metadata.

The following screenshots are taken from a typical SBOM:

image

image

2. Aggregating and Verifying Vulnerabilities

The F17_cve_bin_tool.sh module is the central orchestrator for vulnerability aggregation. It uses the generated SBOM as its starting point.

  • CVE Lookup: For every component identified in the SBOM, F17 uses cve-bin-tool (an external tool integrated by EMBA) to query known vulnerability databases (like the National Vulnerability Database - NVD) based on the component's name and version.

    # Simplified from modules/F17_cve_bin_tool.sh
    # Iterates through SBOM entries and runs cve-bin-tool
    python3 "${lCVE_BIN_TOOL}" -i "${LOG_PATH_MODULE}/${lBOM_REF}.tmp.csv" \
      --disable-version-check --offline -f csv \
      -o "${LOG_PATH_MODULE}/${lBOM_REF}_${lPRODUCT_NAME}_${lVERS}" || true

    This python3 command invokes cve-bin-tool, telling it to read component information from a temporary CSV file (created from the SBOM entry) and output found vulnerabilities to a new CSV file. The --offline flag means it uses locally cached vulnerability databases.

  • Exploit Information: For each CVE found, EMBA then checks various exploit databases to see if public exploit code or a Proof-of-Concept (PoC) exists. This includes:

    • Exploit-DB: A public archive of exploits and shellcode.
    • Metasploit Framework: A popular penetration testing framework with many exploit modules.
    • CISA Known Exploited Vulnerabilities (KEV): A catalog maintained by the U.S. Cybersecurity and Infrastructure Security Agency for vulnerabilities known to be actively exploited.
    • Packetstormsecurity & Snyk: Other public sources for PoCs and advisories.
    • Routersploit: A framework specifically for embedded device exploitation.

    EMBA regularly updates its local copies of these exploit databases using helper scripts (e.g., helpers/known_exploited_vulns_update.sh, helpers/metasploit_db_update.sh, helpers/packet_storm_crawler.sh, helpers/snyk_crawler.sh).

    The tear_down_cve_threader function within F17 performs these checks for each CVE:

    # Simplified from modules/F17_cve_bin_tool.sh (tear_down_cve_threader function)
    # Check if exploit exists in Exploit-DB
    mapfile -t lEXPLOIT_AVAIL_EDB_ARR < <(cve_searchsploit "${lCVE_ID}" 2>/dev/null || true)
    if [[ " ${lEXPLOIT_AVAIL_EDB_ARR[*]} " =~ "Exploit DB Id:" ]]; then
      lEXPLOIT="Exploit (EDB ID: ${lEXPLOIT_ID})"
    fi
    
    # Check if exploit exists in Metasploit
    mapfile -t lEXPLOIT_AVAIL_MSF_ARR < <(grep -E "${lCVE_ID}"$ "${MSF_DB_PATH}" 2>/dev/null || true)
    if [[ ${#lEXPLOIT_AVAIL_MSF_ARR[@]} -gt 0 ]]; then
      lEXPLOIT+=" / MSF: ${lEXPLOIT_NAME}"
    fi
    # ... similar checks for Packetstorm, Snyk, Routersploit, KEV

    This code illustrates how EMBA grep (searches) its local copies of exploit databases (MSF_DB_PATH, etc.) for the identified lCVE_ID to determine if a public exploit exists.

  • EPSS (Exploit Prediction Scoring System): EMBA also fetches EPSS scores, which indicate the likelihood of a vulnerability being exploited in the wild.

    # Simplified from modules/F17_cve_bin_tool.sh (get_epss_data function)
    get_epss_data() {
      local lCVE_ID="${1:-}"
      local lCVE_YEAR="$(echo "${lCVE_ID}" | cut -d '-' -f2)"
      lCVE_EPSS_PATH="${EPSS_DATA_PATH}/CVE_${lCVE_YEAR}_EPSS.csv"
      if [[ -f "${lCVE_EPSS_PATH}" ]]; then
        lEPSS_DATA=$(grep "^${lCVE_ID};" "${lCVE_EPSS_PATH}" || true)
        lEPSS_EPSS=$(echo "${lEPSS_DATA}" | cut -d ';' -f2) # The EPSS score
        # ... further processing for percentage
      fi
      echo "${lEPSS_EPSS}" # Returns the EPSS score
    }

    This function reads locally stored EPSS data (which is updated by EMBA's internal helper scripts) to provide a probabilistic score for a given CVE.

  • Vulnerability Verification (Reducing False Positives): A key differentiator for EMBA is its ability to verify if a CVE truly affects the firmware. Simply finding a CVE for a software version doesn't mean it's exploitable in this specific firmware. EMBA does this through modules like:

    • S26_kernel_vuln_verifier.sh: For Linux kernels, this module checks if vulnerable kernel functions are actually present and used in the firmware's kernel binary by analyzing kernel symbols or the kernel's configuration (.config file).

      # Simplified from modules/S26_kernel_vuln_verifier.sh (symbol_verifier function)
      # Checks if a vulnerable function (from a CVE description) is present
      # as an exported symbol in the kernel's compiled binary or modules.
      for lCHUNK_FILE in "${LOG_PATH_MODULE}"/symbols_uniq.split.* ; do
        if grep -q -f "${lCHUNK_FILE}" "${lKERNEL_DIR}/${lK_PATH}" ; then
          echo "Vulnerability ${lCVE} verified via exported symbol in ${lK_PATH}"
          lVULN_FOUND=1
          break
        fi
      done

      This snippet demonstrates how EMBA searches for a vulnerable function's "symbol" (like a function name) within the kernel's compiled code. If found, it increases confidence that the vulnerability is indeed present.

    • S118_busybox_verifier.sh: For BusyBox, this module goes a step further. BusyBox is a collection of many small utilities (called "applets"). A CVE might affect BusyBox in general, but only if a specific applet (e.g., telnetd) is compiled into this firmware. EMBA identifies which applets are actually present (from static analysis or user-mode emulation) and only flags CVEs relevant to those active applets.

      # Simplified from modules/S118_busybox_verifier.sh (busybox_vuln_testing_threader function)
      # Checks if a CVE's summary mentions an applet that is actually present
      for lBB_APPLET in "${BB_VERIFIED_APPLETS[@]}"; do
        if [[ "${lSUMMARY}" == *" ${lBB_APPLET} "* ]]; then
          echo "Verified BusyBox vulnerability ${lCVE} - applet ${lBB_APPLET}"
          # Log this as a *verified* vulnerability
        fi
      done

      This shows how EMBA checks if an identified applet (like telnetd) is mentioned in the summary of a BusyBox CVE. If so, it confirms that this specific vulnerability is relevant to the firmware's BusyBox configuration.

Finally, all this aggregated and verified vulnerability data is incorporated back into the SBOM, usually as a VEX document, providing a holistic view of the firmware's security posture.

Conclusion

The SBOM & Vulnerability Aggregation functionality is the cornerstone of EMBA's security analysis, transforming raw findings into actionable intelligence. By meticulously building a "nutrition label" (SBOM) for your firmware and cross-referencing it with "recall notices" (vulnerabilities), EMBA provides unparalleled transparency into your device's software supply chain. Its advanced verification capabilities help you cut through the noise, focusing only on the vulnerabilities that truly affect your firmware.

With EMBA's detailed analysis complete, the final step is to present all these findings in a clear, concise, and user-friendly manner. In the next chapter, Reporting & User Experience, you will discover how EMBA compiles all its discoveries into comprehensive reports that make sense for both technical and non-technical stakeholders.

Clone this wiki locally