-
-
Notifications
You must be signed in to change notification settings - Fork 62
08_data_models__persistence_layer
In Chapter 7: Background Task Execution, we explored how EMBArk efficiently handles demanding tasks behind the scenes, ensuring the user interface remains responsive. But all those tasks – uploading firmware, running analyses, managing users, orchestrating workers – involve creating, reading, updating, and deleting vast amounts of information. Where does EMBArk keep all this data, and how does it ensure everything is organized and connected?
Imagine EMBArk as a super-efficient office. It has many departments (like firmware analysis, user management, and worker orchestration), and each department handles different kinds of documents:
- User IDs and passwords for the HR department.
- Firmware files and analysis checklists for the lab.
- Results reports and vulnerability findings for the security team.
- Worker schedules and task assignments for the operations team.
The problem "Data Models (Persistence Layer)" solves is like setting up a meticulously organized filing cabinet for this office. It defines the structure of every single "document" (piece of information), ensures each document is stored consistently, and knows how to link related documents together. This is the "memory" of EMBArk, ensuring that all the information about uploaded firmware, analysis results, user accounts, and worker configurations is consistently saved, retrieved, and linked together. Without it, EMBArk would forget everything as soon as you close it!
To manage its data effectively, EMBArk uses two main ideas:
Think of a data model as a blueprint or a template for a specific type of information. Just like a blueprint for a house defines how many rooms it has, where the doors are, and what materials to use, a data model defines:
-
What pieces of information are stored (e.g., for a "Firmware Analysis" document, it might store
version,start_date,status). -
What kind of data each piece is (e.g.,
versionis text,start_dateis a date,statusis a percentage). - How different pieces of information are related (e.g., a "Firmware Analysis" document is always linked to a specific "Firmware File" document and a "User" document).
In EMBArk, these blueprints are defined using Django's models.Model classes in Python. Each class represents a type of data EMBArk needs to store.
The persistence layer is the system that acts like the librarian for our filing cabinet (the database). It handles all the operations needed to:
- Save new "documents" (create new data records).
- Retrieve existing "documents" (read data from the database).
- Update "documents" (change existing data).
- Delete "documents" (remove data).
When you interact with EMBArk (e.g., upload a firmware), the web application talks to the persistence layer, which then translates those requests into commands for the actual database. This layer ensures data is stored safely and can always be found later.
Let's walk through a concrete example: an analyst uploads a new firmware, and EMBArk starts an analysis. This involves creating several new pieces of information and linking them together.
Here's a simplified look at the kind of data EMBArk needs to store for this scenario:
- Firmware File: The actual file uploaded.
- Firmware Analysis: The specific settings for this analysis (version, architecture, who started it).
- User: Who uploaded the firmware and initiated the analysis.
- Result: The outcome of the analysis (CVEs found, security features).
These are all distinct "documents," but they are clearly related. The data models define these relationships, and the persistence layer saves them correctly.
When an analyst uploads a firmware and starts an analysis, here's a simplified sequence of how EMBArk uses its data models and persistence layer:
sequenceDiagram
participant User
participant EMBArk Web Server
participant Data Models (Persistence Layer)
participant Database
User->>EMBArk Web Server: Uploads firmware & submits analysis
EMBArk Web Server->>Data Models (Persistence Layer): Creates new FirmwareFile record
Data Models (Persistence Layer)->>Database: Saves firmware file details
Database-->>Data Models (Persistence Layer): Confirmation
EMBArk Web Server->>Data Models (Persistence Layer): Creates new FirmwareAnalysis record (links to FirmwareFile & User)
Data Models (Persistence Layer)->>Database: Saves analysis details
Database-->>Data Models (Persistence Layer): Confirmation
Note over Data Models (Persistence Layer): EMBA runs, then Importer updates data
EMBArk Web Server->>Data Models (Persistence Layer): Creates new Result record (links to FirmwareAnalysis)
Data Models (Persistence Layer)->>Database: Saves analysis results
Database-->>Data Models (Persistence Layer): Confirmation
EMBArk Web Server-->>User: Analysis started!
The "Data Models (Persistence Layer)" participant here represents the Python code that defines the models and interacts with the database to save and retrieve data.
Let's dive into some of EMBArk's actual data models and see how they're defined and used. EMBArk is built on Django, which provides a powerful Object-Relational Mapper (ORM) that makes interacting with the database feel like working with Python objects.
These models are the backbone of any analysis.
# Simplified snippet from embark/uploader/models.py
import uuid
from django.db import models
from django.utils import timezone
from users.models import User as Userclass # Import User model
class FirmwareFile(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4)
file = models.FileField(upload_to='firmware_files/') # Stores the actual file path
upload_date = models.DateTimeField(default=timezone.now)
user = models.ForeignKey(Userclass, on_delete=models.SET_NULL, null=True) # Link to the User
def __str__(self):
return self.file.name # Display the file name
class FirmwareAnalysis(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4)
user = models.ForeignKey(Userclass, on_delete=models.SET_NULL, null=True) # Link to the User
firmware = models.ForeignKey(FirmwareFile, on_delete=models.SET_NULL, null=True) # Link to the FirmwareFile
firmware_name = models.CharField(max_length=127, default="File unknown")
version = models.CharField(max_length=127, blank=True)
start_date = models.DateTimeField(default=timezone.now)
finished = models.BooleanField(default=False)
status = models.JSONField(default=dict) # For real-time progress (Chapter 3)
# Other analysis settings like architecture, scan_modules (Chapter 2)
# ...
def __str__(self):
return f"Analysis {self.id} for {self.firmware_name}"-
FirmwareFile: This blueprint defines how to store information about an uploaded firmware file.-
id: A unique identifier (UUID). -
file: Stores the actual firmware binary. -
user = models.ForeignKey(Userclass, ...): This is a foreign key, a crucial concept! It creates a link to theUsermodel. This means eachFirmwareFilerecord "knows" whichUseruploaded it. If theUseris deleted (on_delete=models.SET_NULL), this link becomesNULL.
-
-
FirmwareAnalysis: This blueprint holds all the details about one specific analysis run.- It also has a
ForeignKeytoUser(who started the analysis). - And a
ForeignKeytoFirmwareFile(which file is being analyzed). -
status = models.JSONField(default=dict): This field (used in Chapter 3: Real-time Progress Monitoring) stores dynamic, structured data like the current progress percentage.
- It also has a
After EMBA finishes an analysis, EMBArk stores all the findings in these models.
# Simplified snippet from embark/dashboard/models.py
from django.db import models
import uuid
from uploader.models import FirmwareAnalysis # Import FirmwareAnalysis model
class Vulnerability(models.Model):
cve = models.CharField(max_length=18, help_text='CVE-XXXX-XXXXXXX')
info = models.JSONField(null=True)
def __str__(self):
return self.cve
class SoftwareInfo(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4)
name = models.CharField(max_length=256)
version = models.CharField(max_length=32)
supplier = models.CharField(max_length=1024)
# ... other fields like license, cpe, purl ...
def __str__(self):
return f"{self.name} {self.version}"
class SoftwareBillOfMaterial(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4)
meta = models.CharField(max_length=1024)
component = models.ManyToManyField(SoftwareInfo, blank=True) # Link to many SoftwareInfo objects
def __str__(self):
return f"SBOM {self.id}"
class Result(models.Model):
firmware_analysis = models.OneToOneField(FirmwareAnalysis, on_delete=models.CASCADE, primary_key=True) # One-to-one link
os_verified = models.CharField(blank=True, null=True, max_length=256)
cve_critical = models.TextField(default='{}') # Stores JSON string
cve_high = models.TextField(default='{}')
exploits = models.IntegerField(default=0)
vulnerability = models.ManyToManyField(Vulnerability, blank=True) # Link to many Vulnerability objects
sbom = models.OneToOneField(SoftwareBillOfMaterial, on_delete=models.CASCADE, null=True, blank=True) # Optional One-to-one link
# Other result fields like canary, relro, no_exec, pie, stripped (Chapter 4)
# ...
def __str__(self):
return f"Results for {self.firmware_analysis.firmware_name}"-
Vulnerability: A simple blueprint for storing a CVE ID and its details. -
SoftwareInfo: A blueprint for a single software component (used in SBOM). -
SoftwareBillOfMaterial: Represents an entire SBOM.-
component = models.ManyToManyField(SoftwareInfo, ...): This is a many-to-many relationship. An SBOM can have many software components, and aSoftwareInfoitem (like "OpenSSL 1.1.1") might appear in many different SBOMs.
-
-
Result: This is the main summary of the analysis.-
firmware_analysis = models.OneToOneField(FirmwareAnalysis, ...): This is a one-to-one relationship. EachFirmwareAnalysiscan only have oneResultsummary, and eachResultbelongs to oneFirmwareAnalysis. This is important for linking the summary findings back to the original analysis settings. -
vulnerability = models.ManyToManyField(Vulnerability, ...): AResultcan have manyVulnerabilityentries, and aVulnerability(like CVE-2023-1234) might be found in manyResults. -
sbom = models.OneToOneField(SoftwareBillOfMaterial, ...): Optionally links to an SBOM.
-
Users are fundamental to EMBArk, as seen in Chapter 1: User Authentication & Authorization.
# Simplified snippet from embark/users/models.py
from django.db import models
from django.contrib.auth.models import AbstractUser # Django's base user model
class Team(models.Model):
name = models.CharField(primary_key=True, max_length=150, unique=True)
is_active = models.BooleanField(default=True)
def __str__(self):
return self.name
class User(AbstractUser):
timezone = models.CharField(max_length=32, default='UTC')
email = models.EmailField(unique=True, blank=True)
team = models.ForeignKey(Team, on_delete=models.SET_NULL, null=True, related_name='member_of_team') # Link to Team
api_key = models.CharField(max_length=64, blank=True, null=True) # User's API key
class Meta:
permissions = (
("user_permission", "Can access user menues of embark"),
("uploader_permission_minimal", "Can access uploader functionalities of embark"),
# ... many more custom permissions (Chapter 1) ...
)
def __str__(self):
return self.username-
Team: A blueprint for user teams. -
User: This model extends Django's built-inAbstractUser, adding custom fields liketimezone,api_key, and aForeignKeytoTeam. It also defines specific permissions, which are crucial for Chapter 1: User Authentication & Authorization.
For distributed analysis, EMBArk needs to store information about its worker nodes and how to manage them, as discussed in Chapter 6: Worker Node Orchestration.
# Simplified snippet from embark/workers/models.py
import ipaddress
from django.db import models
from users.models import User # Import User model
class Configuration(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
name = models.CharField(max_length=150)
ssh_user = models.CharField(max_length=150)
ip_range = models.CharField(max_length=20) # e.g., 192.168.1.0/24
# ... other SSH key fields ...
def clean(self): # Custom validation
try:
ipaddress.ip_network(self.ip_range, strict=False)
except ValueError as value_error:
raise ValidationError({"ip_range": f"Invalid IP range: {value_error}"}) from value_error
def __str__(self):
return self.name
class Worker(models.Model):
configurations = models.ManyToManyField(Configuration, blank=True) # Many-to-many link
ip_address = models.GenericIPAddressField(unique=True)
name = models.CharField(max_length=100)
reachable = models.BooleanField(default=False)
status = models.CharField(max_length=1, default='U') # e.g., 'U'nconfigured, 'C'onfigured
analysis_id = models.UUIDField(blank=True, null=True) # ID of current analysis running on worker
def __str__(self):
return f"{self.name} ({self.ip_address})"
class OrchestratorState(models.Model):
free_workers = models.ManyToManyField(Worker, related_name='free_workers') # Many-to-many link to free workers
busy_workers = models.ManyToManyField(Worker, related_name='busy_workers') # Many-to-many link to busy workers
tasks = models.JSONField(default=list, null=True) # The queue of tasks
def __str__(self):
return "Orchestrator State"-
Configuration: Defines how EMBArk connects to and manages workers (SSH credentials, IP ranges). It has aForeignKeytoUser. -
Worker: Represents an individual worker machine.-
configurations = models.ManyToManyField(Configuration, ...): AWorkercan be managed by multipleConfigurations (if, for example, multiple users manage the same worker with different SSH keys). -
analysis_id: Stores the ID of theFirmwareAnalysiscurrently running on this worker.
-
-
OrchestratorState: This model is crucial for the Chapter 6: Worker Node Orchestration. It stores the state of the Orchestrator.-
free_workers = models.ManyToManyField(Worker, ...): A list of workers currently available. -
busy_workers = models.ManyToManyField(Worker, ...): A list of workers currently running an analysis. -
tasks = models.JSONField(...): Stores the queue of tasks that are waiting to be assigned to workers.
-
Django automatically provides an administrative interface where you can view, create, update, and delete these model records. You just need to "register" them.
# Simplified snippet from embark/dashboard/admin.py
from django.contrib import admin
from dashboard.models import Result, Vulnerability, SoftwareInfo, SoftwareBillOfMaterial
admin.site.register(Result)
admin.site.register(Vulnerability)
admin.site.register(SoftwareInfo)
admin.site.register(SoftwareBillOfMaterial)
# Similar registrations in uploader/admin.py, users/admin.py, workers/admin.pyThis simple admin.site.register() line tells Django to make these data models available in its built-in admin dashboard, allowing administrators to easily manage the "documents" in EMBArk's "filing cabinet."
Data Models and the Persistence Layer are the unsung heroes of EMBArk. They provide the structured "memory" for the entire system, defining the blueprints for all information (users, firmware, analysis results, workers) and ensuring this data is consistently saved, retrieved, and linked. By leveraging Django's powerful ORM, EMBArk can manage complex relationships between different types of information, forming the essential backbone for all its operations and reporting.
Now that we understand how EMBArk stores and organizes its data, the final piece of the puzzle is how to get EMBArk up and running in a real-world environment. In the next chapter, we'll cover Chapter 9: Deployment & Environment Setup, where you'll learn how to deploy and configure EMBArk for production use.
Generated by AI Codebase Knowledge Builder. References: [1], [2], [3], [4], [5], [6], [7], [8]
EMBArk - firmware security scanning at its best
Sponsor EMBA and EMBArk:
The EMBA environment is free and open source!
We put a lot of time and energy into these tools and related research to make this happen. It's now possible for you to contribute as a sponsor!
If you like EMBArk you have the chance to support future development by becoming a Sponsor
Thank You ❤️ Get a Sponsor
EMBArk - firmware security scanning at its best
- Home
- Feature overview
- Installation
- Usage
-
EMBArk-book
- Overview of embark
- Chapter 1: User Authentication & Authorization
- Chapter 2: Firmware Analysis Management
- Chapter 3: Real-time Progress Monitoring
- Chapter 4: Reporting & Visualization
- Chapter 5: EMBA Backend Integration
- Chapter 6: Worker Node Orchestration
- Chapter 7: Background Task Execution
- Chapter 8: Data Models (Persistence Layer)
- Chapter 9: Deployment & Environment Setup
- Development
- FAQ
- Sponsoring EMBArk
- AMOS project archive
- EMBA firmware scanning backend