Skip to content

04_reporting___visualization

Benedikt Kuehne edited this page Jan 7, 2026 · 1 revision

Chapter 4: Reporting & Visualization

In Chapter 3: Real-time Progress Monitoring, we learned how EMBArk keeps you updated on your firmware analysis as it happens, like a live news ticker. But once the analysis is complete, you don't just want a stream of log messages; you need clear, actionable insights!

Imagine you've sent a complex report for analysis. You don't want to sift through pages of raw data to find the key takeaways. Instead, you need a summary that highlights important points, visualizes trends, and helps you quickly understand the overall picture.

This is where "Reporting & Visualization" comes in. Once EMBA (the powerful backend analyzer) finishes its deep dive into your firmware, EMBArk steps in to transform all those raw findings into easy-to-understand reports and interactive charts. It’s like a skilled infographic designer, turning complex security data into clear visual summaries (dashboards) and detailed documents, helping you quickly grasp the security posture of your firmware and track trends across multiple analyses. This makes complex security data accessible and actionable.

Solving Our Use Case: Understanding Firmware Security Findings

Let's consider our analyst. After a firmware analysis completes, they need to:

  1. View a detailed report for a single firmware, showing specific vulnerabilities and security properties.
  2. See an overview of all analyzed firmwares to identify common issues or trends across the board.
  3. Access a Software Bill of Materials (SBOM) to understand all components within a firmware.

EMBArk provides dedicated dashboards and reports to address these needs.

Understanding the Key Concepts

To make sense of the analysis results, EMBArk uses a few key concepts:

1. Individual Reports: "What's in this firmware?"

This is a comprehensive breakdown of the security findings for a single firmware analysis. It includes details like detected operating systems, architectures, common vulnerabilities (CVEs), security hardening features (like PIE, NX, RELRO), and more.

2. Aggregated Reports/Dashboards: "What trends do we see across all firmwares?"

This provides a high-level summary and visual trends across all firmware analyses performed in EMBArk. It shows distributions of operating systems, architectures, overall CVE counts, and common weaknesses across your entire fleet of analyzed firmwares. It's great for managers or security teams to see the bigger picture.

3. Software Bill of Materials (SBOM): "What ingredients are in this software?"

An SBOM is a complete, nested inventory of all software components (open source and commercial) present in a firmware. It's like an ingredient list for your software. EMBArk can generate and display these, which is crucial for supply chain security.

How EMBArk Presents Analysis Results

Let's walk through how our analyst uses EMBArk to get these insights.

1. Accessing an Individual Report

Once an analysis is finished, you can navigate to the "Report Dashboard" (e.g., embark-ReportDashboard). This table lists all your completed analyses. Clicking on an entry, or an "Open Report" button, will lead you to a detailed individual report.

What happens: EMBArk fetches all specific details about that single analysis from its database and presents it in a user-friendly way.

Here's how EMBArk fetches the individual report data:

# Simplified snippet from embark/reporter/views.py

@permission_required("users.reporter_permission", login_url='/')
# ... decorators ...
def get_individual_report(request, analysis_id):
    # 1. Check if the analysis exists and the user is authorized.
    analysis_object = FirmwareAnalysis.objects.get(id=analysis_id)
    # ... authorization check ...

    # 2. Retrieve the Result object linked to this analysis.
    result = Result.objects.get(firmware_analysis=analysis_object)

    # 3. Convert the Result model into a dictionary for easy use.
    return_dict = model_to_dict(instance=result, exclude=['vulnerability'])

    # 4. Add additional related information (e.g., firmware name, dates).
    return_dict['firmware_name'] = analysis_object.firmware_name
    return_dict['start_date'] = analysis_object.start_date
    return_dict['end_date'] = analysis_object.end_date
    # ... more fields are added ...

    # 5. Convert JSON fields back to Python objects (e.g., 'strcpy_bin').
    return_dict['strcpy_bin'] = json.loads(return_dict['strcpy_bin'])

    # 6. Return the data as a JSON response.
    return JsonResponse(data=return_dict, status=HTTPStatus.OK)

This Python function (get_individual_report) acts as the data provider. When your browser requests an individual report, this function queries the database, gathers all the relevant details about the FirmwareAnalysis and its Result, and sends it back as a JSON object.

The web page then uses JavaScript to display this data:

// Simplified snippet from embark/static/scripts/individualReportDashboard.js

// 1. Get the analysis ID from the current URL.
let report_id = window.location.pathname.split("/").pop();
// 2. Fetch the individual report data from the backend.
function get_individual_report() {
    let url = window.location.origin + "/get_individual_report/" + report_id;
    return $.getJSON(url).then(function (data) {
        return data;
    });
}

// 3. Once data is received, generate charts and populate tables.
get_individual_report().then(function (returnData) {
    let critical = JSON.parse(returnData.cve_critical)[0];
    let high = JSON.parse(returnData.cve_high)[0];
    // ... parse other CVE levels ...

    // 4. Create charts (e.g., for CVE distribution).
    let cvedoughnutChart = new Chart(accumulatedCveDoughnut, {
        type: 'doughnut',
        data: {
            labels: ['CVE-Critical', 'CVE-High', 'CVE-Medium', 'CVE-Low'],
            datasets: [{
                data: [critical, high, medium, low],
                backgroundColor: ['rgb(255, 99, 99)', 'rgb(255, 172, 99)', 'rgb(255, 205, 86)', 'rgb(54, 162, 235)'],
            }]
        },
        // ... chart options ...
    });

    // 5. Populate a detail table with key-value pairs.
    const table = document.getElementById("detail_body");
    for (const [key, value] of Object.entries(data_to_display)) {
        let row = table.insertRow();
        let cell1 = row.insertCell(0);
        cell1.innerHTML = key;
        let cell2 = row.insertCell(1);
        cell2.innerHTML = value;
    }
});

This JavaScript snippet (individualReportDashboard.js) fetches the data using an AJAX call to get_individual_report. Once the data arrives, it dynamically creates charts (like the CVE doughnut chart) and populates a detailed table, giving the analyst a full overview.

You might also see a link to "Open Report" which leads to the raw HTML report generated by EMBA. This is served by reporter/views.py's html_report function, which simply renders the static HTML file directly from the analysis logs.

2. Viewing the Aggregated Dashboard

For an overview of all analyses, the analyst would go to the "Main Dashboard" (e.g., embark-MainDashboard). This dashboard uses aggregated data to display overall trends and statistics.

What happens: EMBArk calculates statistics across all completed analyses and presents them in a series of charts and summary cards.

Here's how EMBArk collects the aggregated data:

# Simplified snippet from embark/reporter/views.py

@permission_required("users.reporter_permission", login_url='/')
# ... decorators ...
def get_accumulated_reports(request):
    results = Result.objects.all() # 1. Get ALL Result objects from the database.
    data = {} # Dictionary to store aggregated statistics.

    for result in results:
        result_dict = model_to_dict(result) # Convert each result to a dictionary.
        # 2. Extract and aggregate data for various fields.
        # Example: Count OS distributions
        os_verified_value = result_dict.pop('os_verified')
        if 'os_verified' not in data:
            data['os_verified'] = {}
        if os_verified_value not in data['os_verified']:
            data['os_verified'][os_verified_value] = 0
        data['os_verified'][os_verified_value] += 1

        # Example: Sum up counts for integer fields (e.g., files, directories, CVEs).
        for field in ['files', 'directories', 'cve_high', 'cve_medium', 'cve_low']:
            if field not in data:
                data[field] = {'sum': 0, 'count': 0}
            data[field]['count'] += 1
            if result_dict[field] is not None:
                if field.startswith('cve_'):
                    cve_value = json.loads(result_dict[field])[0] # Parse JSON like "['614', '17']"
                    data[field]['sum'] += int(cve_value)
                else:
                    data[field]['sum'] += result_dict[field]

    # 3. Calculate means and other summary statistics.
    for field in data:
        if isinstance(data[field], dict) and 'sum' in data[field] and data[field]['count'] > 0:
            data[field]['mean'] = data[field]['sum'] / data[field]['count']

    data['total_firmwares'] = len(results)
    # ... logic for top_strcpy_bins, top_system_bins ...

    return JsonResponse(data=data, status=HTTPStatus.OK)

The get_accumulated_reports function iterates through all Result entries in the database, extracts specific data points (like OS, architecture, CVE counts), aggregates them, and then returns a single JSON object containing summary statistics and distributions.

The frontend then consumes this data to render various charts:

// Simplified snippet from embark/static/scripts/accumulatedReports.js

// 1. Fetch aggregated data from the backend.
function get_accumulated_reports() {
    let url = window.location.origin + "/get_accumulated_reports/";
    return $.getJSON(url).then(function (data) {
        return data;
    });
}

// 2. Once data is received, update summary cards and create charts.
get_accumulated_reports().then(makeCharts);

function makeCharts(returnData) {
    if (returnData.total_firmwares !== 0) {
      document.getElementById("firmwareAnalysed").textContent = returnData.total_firmwares;
      // ... update other summary cards like totalFiles, totalDirectories ...
    }

    // 3. Create a pie chart for CVE distribution.
    let cvePieChart = new Chart(accumulatedCvePie, {
        type: 'pie',
        data: {
            labels: ['CVE-Critical', 'CVE-High', 'CVE-Medium', 'CVE-Low'],
            datasets: [{
                data: [returnData.cve_critical.sum, returnData.cve_high.sum, returnData.cve_medium.sum, returnData.cve_low.sum],
                backgroundColor: ['rgb(255, 99, 132)', 'rgb(250, 164, 5)', 'rgb(240, 252, 0)', 'rgb(54, 162, 235)'],
            }]
        },
        // ... chart options ...
    });

    // 4. Create bar charts for OS and Architecture distribution.
    let architectureBarChart = new Chart(accumulatedArchitecture, {
        type: 'bar',
        data: {
            labels: Object.keys(returnData.architecture_verified),
            datasets: [{
                label: 'Architecture Distribution',
                data: Object.values(returnData.architecture_verified),
                backgroundColor: getRandomColors(Object.keys(returnData.architecture_verified).length)
            }]
        },
        // ... chart options ...
    });

    // ... create other charts for security features (NX, PIE, RELRO) and top binaries ...
}

This accumulatedReports.js script fetches the aggregated data and then uses the Chart.js library to generate various interactive charts, providing a visual summary of the entire EMBArk dataset.

3. Accessing the Software Bill of Materials (SBOM)

If EMBA generates an SBOM for a firmware, you can often access it directly from the individual report or the report dashboard.

What happens: EMBArk retrieves the SBOM data, which is stored in a structured format, and displays it.

# Simplified snippet from embark/dashboard/views.py

@permission_required("users.reporter_permission", login_url='/')
# ... decorators ...
def get_sbom_analysis(request, analysis_id):
    # 1. Get the FirmwareAnalysis and its associated Result.
    analysis_object = FirmwareAnalysis.objects.get(id=analysis_id)
    result = Result.objects.get(firmware_analysis=analysis_object)

    # 2. Check if an SBOM exists for this result.
    if result.sbom:
        # 3. Retrieve the SBOM object.
        sbom_obj = result.sbom
        # 4. Construct a dictionary with SBOM metadata and components.
        sbom_data = {
            "id": str(sbom_obj.id),
            "meta": sbom_obj.meta,
            "file": sbom_obj.file,
            "components": []
        }
        for component in sbom_obj.component.all():
            sbom_data["components"].append({
                "name": component.name,
                "version": component.version,
                "supplier": component.supplier,
                "license": component.license,
                "cpe": component.cpe,
                "purl": component.purl,
                # ... other software info fields ...
            })
        return JsonResponse(data=sbom_data, status=HTTPStatus.OK)
    return JsonResponse(data={'error': 'SBOM Not Found'}, status=HTTPStatus.NOT_FOUND)

This Python function (get_sbom_analysis) retrieves the SBOM details from the database, including all the individual software components, and formats it as a JSON response for the frontend to display.

Under the Hood: From Raw Logs to Rich Reports

Let's explore the inner workings of how EMBArk transforms raw analysis output into presentable reports.

The Reporting Flow: A Simple Sequence

When an EMBA analysis finishes, here's a simplified sequence of how EMBArk generates reports:

sequenceDiagram
    participant EMBA Analyzer
    participant Importer
    participant Database (Models)
    participant Web Server (Views)
    participant Web Browser (JS + HTML)

    EMBA Analyzer->>Importer: Writes `emba_logs` (CSV, JSON, HTML)
    Importer->>Database (Models): Reads logs and saves structured data (Result, SBOM)
    Web Browser (JS + HTML)->>Web Server (Views): Requests report data (e.g., /get_individual_report)
    Web Server (Views)->>Database (Models): Queries Result, SBOM, and FirmwareAnalysis
    Database (Models)-->>Web Server (Views): Returns requested data
    Web Server (Views)-->>Web Browser (JS + HTML): Sends JSON data
    Web Browser (JS + HTML)->>Web Browser (JS + HTML): Renders charts and tables
Loading

Note over Database (Models): The FirmwareAnalysis model links to the Result and SBOM models.

Key Components and Code Elements

  1. embark/dashboard/models.py - The Blueprint for Results: This file defines the Django models that store the structured results of an EMBA analysis.

    # Simplified snippet from embark/dashboard/models.py
    from django.db import models
    import uuid
    from uploader.models import FirmwareAnalysis
    
    class Vulnerability(models.Model):
        cve = models.CharField(max_length=18, help_text='CVE-XXXX-XXXXXXX')
        info = models.JSONField(null=True, editable=True)
    
    class SoftwareInfo(models.Model):
        id = models.UUIDField(primary_key=True, default=uuid.uuid4)
        name = models.CharField(max_length=256)
        version = models.CharField(max_length=32)
        supplier = models.CharField(max_length=1024)
        license = models.CharField(max_length=1024)
        cpe = models.CharField(max_length=256)
        purl = models.CharField(max_length=256)
        # ... more fields for software components ...
    
    class SoftwareBillOfMaterial(models.Model):
        id = models.UUIDField(primary_key=True, default=uuid.uuid4)
        meta = models.CharField(max_length=1024)
        component = models.ManyToManyField(SoftwareInfo, blank=True)
        file = models.FilePathField(max_length=110)
    
    class Result(models.Model):
        firmware_analysis = models.OneToOneField(FirmwareAnalysis, on_delete=models.CASCADE, primary_key=True)
        os_verified = models.CharField(blank=True, null=True, max_length=256)
        architecture_verified = models.CharField(blank=True, null=True, max_length=100)
        files = models.IntegerField(default=0)
        directories = models.IntegerField(default=0)
        cve_critical = models.TextField(default='{}')
        cve_high = models.TextField(default='{}')
        # ... other security property fields like canary, relro, no_exec, pie, stripped ...
        vulnerability = models.ManyToManyField(Vulnerability, blank=True)
        sbom = models.OneToOneField(SoftwareBillOfMaterial, on_delete=models.CASCADE, null=True, blank=True)
    • Vulnerability: Stores details about individual CVEs found.
    • SoftwareInfo: Represents a single component within an SBOM.
    • SoftwareBillOfMaterial: Represents the overall SBOM, linking to many SoftwareInfo components.
    • Result: This is the main model that holds the summary of an analysis. It has a one-to-one relationship with FirmwareAnalysis (from Chapter 2: Firmware Analysis Management), and links to Vulnerability and SoftwareBillOfMaterial objects. It stores counts for files, directories, different CVE severities, and security features.
  2. embark/porter/importer.py - The Data Transformer: This script is crucial for taking the raw output files from EMBA and parsing them into the structured Result, Vulnerability, and SoftwareBillOfMaterial models. It runs in the background once an EMBA analysis is complete.

    # Simplified snippet from embark/porter/importer.py
    import csv
    import json
    import logging
    import os
    import re
    
    from django.conf import settings
    from dashboard.models import SoftwareBillOfMaterial, SoftwareInfo, Vulnerability, Result
    from uploader.models import FirmwareAnalysis
    
    logger = logging.getLogger(__name__)
    
    def result_read_in(analysis_id):
        # 1. Finds the EMBA log directory for the given analysis_id.
        csv_directory = f"{settings.EMBA_LOG_ROOT}/{analysis_id}/emba_logs/csv_logs/"
        # 2. Iterates through specific CSV files generated by EMBA.
        for file_ in os.listdir(csv_directory):
            if file_.endswith('f50_base_aggregator.csv'):
                f50_csv(os.path.join(csv_directory, file_), analysis_id)
            # ... other CSV files ...
    
        # 3. Checks for and processes the SBOM JSON file.
        sbom_file = f"{settings.EMBA_LOG_ROOT}/{analysis_id}/emba_logs/SBOM/EMBA_cyclonedx_sbom.json"
        if os.path.isfile(sbom_file):
            sbom_json(sbom_file, analysis_id)
        # ... returns the Result object ...
    
    def read_csv(path):
        # Generic function to read simple EMBA CSV files into a dictionary.
        res_dict = {}
        with open(path, mode='r', encoding='utf-8') as csv_file:
            csv_reader = csv.reader(csv_file, delimiter=';')
            for row in csv_reader:
                # ... logic to parse CSV rows into a dictionary ...
                if len(row) == 2:
                    res_dict[row[0]] = row[1]
                # ... handles more complex row structures ...
        return res_dict
    
    def f50_csv(file_path, analysis_id):
        res_dict = read_csv(path=file_path) # 1. Read the CSV into a dictionary.
        # 2. Get or create a Result object for this analysis.
        res, _created = Result.objects.get_or_create(
            firmware_analysis=FirmwareAnalysis.objects.get(id=analysis_id)
        )
        if _created: # Only fill if newly created to avoid overwriting.
            # 3. Populate Result model fields from the dictionary.
            res.emba_command = res_dict.get("emba_command", '')
            res.os_verified = res_dict.get("os_verified", '')
            res.files = int(res_dict.get("files", 0))
            res.cve_critical = json.dumps(res_dict.get("cve_critical", {'0': '0'}).popitem())
            # ... populate many other fields ...
            res.save()
        return res
    
    def sbom_json(_file_path, _analysis_id):
        json_data = read_cyclone_dx_json(_file_path) # 1. Read the JSON SBOM file.
        sbom_uuid = json_data['serialNumber'].split(":")[2]
        sbom_obj, created_sbom = SoftwareBillOfMaterial.objects.get_or_create(id=sbom_uuid)
        sbom_obj.file = _file_path
        if created_sbom:
            for component_ in json_data['components']: # 2. Iterate through components in SBOM.
                new_sitem, _add = SoftwareInfo.objects.get_or_create( # 3. Create/update SoftwareInfo objects.
                    id=component_['bom-ref'],
                    name=component_['name'],
                    version=component_['version'],
                    supplier=component_['supplier'] or 'NA',
                    license=json.dumps(component_['licenses']) or 'NA',
                    cpe=component_['cpe'] or 'NA',
                    purl=component_['purl'] or 'NA'
                )
                sbom_obj.component.add(new_sitem) # 4. Link component to SBOM.
        sbom_obj.save()
        res, _ = Result.objects.get_or_create(firmware_analysis=FirmwareAnalysis.objects.get(id=_analysis_id))
        res.sbom = sbom_obj # 5. Link SBOM to Result.
        res.save()
        return res
    • result_read_in: This is the main entry point after EMBA finishes. It orchestrates the reading of different log files.
    • read_csv: A helper that parses generic CSV files into a Python dictionary.
    • f50_csv: Specifically handles f50_base_aggregator.csv, extracting core statistics and security properties to populate the Result model.
    • sbom_json: Parses the EMBA-generated EMBA_cyclonedx_sbom.json file. It creates SoftwareInfo entries for each component found and then links them to a SoftwareBillOfMaterial object, which is then associated with the Result.
  3. embark/reporter/views.py & embark/dashboard/views.py - The Presenters: These Python files contain the "views" (functions that handle web requests) responsible for fetching data from the Result and related models and preparing it for display in the browser.

    • get_individual_report: (Already discussed in "Solving Our Use Case") Fetches data for a single firmware's detailed report.
    • get_accumulated_reports: (Already discussed in "Solving Our Use Case") Aggregates data across all firmware analyses for the main dashboard.
    • html_report, html_report_path, html_report_resource: These views in reporter/views.py are specifically designed to serve the static HTML reports, images, and CSS files that EMBA generates. They ensure that users can only access reports for analyses they are authorized to see.
  4. embark/dashboard/urls.py & embark/reporter/urls.py - The Map: These files define the URL patterns that map specific web addresses to the view functions.

    # Simplified snippet from embark/dashboard/urls.py
    from django.urls import path
    from dashboard import views
    
    urlpatterns = [
        path('dashboard/main/', views.main_dashboard, name='embark-MainDashboard'),
        path('dashboard/report/', views.report_dashboard, name='embark-ReportDashboard'),
        path('dashboard/individualReport/<uuid:analysis_id>', views.individual_report_dashboard, name='embark-IndividualReportDashboard'),
        path('dashboard/report/sbom/<uuid:analysis_id>', views.get_sbom_analysis, name='embark-dashboard-sbom'),
        # ... other dashboard-related paths ...
    ]
    • These URL patterns link URLs like /dashboard/main/ to main_dashboard (which uses get_accumulated_reports internally), and /dashboard/individualReport/YOUR_ID to individual_report_dashboard (which uses get_individual_report).
  5. embark/templates/ & embark/static/scripts/ - The User Interface: The HTML templates (e.g., mainDashboard.html, individualReportDashboard.html) define the structure of the dashboards, and the JavaScript files (e.g., accumulatedReports.js, individualReportDashboard.js) bring them to life by:

    • Making AJAX requests to the views.py functions to fetch data.
    • Using libraries like Chart.js to draw interactive graphs and charts from the received JSON data.
    • Dynamically populating tables and cards with detailed information.

Conclusion

Reporting & Visualization is where the true value of EMBArk's deep firmware analysis comes to life. By converting raw analysis logs into structured data stored in powerful models like Result and SoftwareBillOfMaterial, and then presenting this data through dynamic dashboards and detailed reports, EMBArk empowers users to quickly understand security posture and track trends. This makes complex security insights accessible and actionable, whether you need to deep-dive into a single firmware or get a birds-eye view of your entire analyzed inventory.

Next, we'll dive into the heart of the analysis engine itself. Learn how EMBArk seamlessly integrates with the EMBA backend to perform its powerful security scans in Chapter 5: EMBA Backend Integration.


Generated by AI Codebase Knowledge Builder. References: [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]

Clone this wiki locally