-
Notifications
You must be signed in to change notification settings - Fork 14
docs: node scan rfc #922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: node scan rfc #922
Changes from all commits
886c479
e2e83f8
634fb5a
5d7a867
64635ff
37c0bcb
798bd9f
cde44ed
60da031
d1db95f
4a9ce79
402020e
f2fc011
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
| @@ -0,0 +1,167 @@ | ||||
| | | | | ||||
| | :----------- | :------------------------------ | | ||||
| | Feature Name | Node Scan | | ||||
| | Start Date | March 5th, 2026 | | ||||
| | Category | Architecture | | ||||
| | RFC PR | [#922](https://github.com/kubewarden/sbomscanner/pull/922) | | ||||
| | State | **ACCEPTED** | | ||||
|
|
||||
| # Summary | ||||
|
|
||||
| [summary]: #summary | ||||
|
|
||||
| Define the architectural and functional requirements for scanning Kubernetes cluster nodes. | ||||
|
|
||||
| # Motivation | ||||
|
|
||||
| [motivation]: #motivation | ||||
|
|
||||
| We aim to develop a full-stack, SBOM-based security scanner for Kubernetes. | ||||
| Because nodes are the foundation of the cluster, maintaining visibility into their | ||||
| security posture is critical. | ||||
|
|
||||
| This feature provides a comprehensive overview of node-level vulnerabilities, | ||||
| ensuring the safety of the infrastructure where workloads reside. | ||||
|
|
||||
| ## Examples / User Stories | ||||
|
|
||||
| [examples]: #examples | ||||
|
|
||||
| - As a user, I want to have a comprehensive overview of node-level vulnerabilities, ensuring the safety of the infrastructure where workloads reside. | ||||
| - As a user, I want to automatically scan cluster nodes for vulnerabilities on a recurring basis. | ||||
| - As a user, I want to define the scan interval for my nodes. | ||||
| - As a user, I want the ability to exclude specific files or directories from the scan to reduce noise or avoid sensitive paths. | ||||
|
|
||||
| # Detailed design | ||||
|
|
||||
| [design]: #detailed-design | ||||
|
|
||||
| Node scanning is implemented by deploying a `DaemonSet` that executes a worker | ||||
| component on every node. | ||||
|
|
||||
| The worker will be provided with these new flags: | ||||
| * `--mode` to operate between `registry` and `node` scanning | ||||
| * `--node-name` to specify the name of the node to be scanned (only used in `node` scanning mode) | ||||
|
|
||||
| This approach will allow for significant code reuse across different scan targets. | ||||
| When `--mode=node` is set, the `--node-name` flag must be provided, | ||||
| and the worker will subscribe to the NATS subject `sbomscanner.nodescan.{node-name}` | ||||
| to receive scan jobs specific to that node. | ||||
| It will sit idle most of the time and perform the job only when requested to do it. | ||||
|
|
||||
| This feature also allows nodes to be excluded from the scan (eg. if they don't have enough resources). | ||||
| This can be achieved with the `nodeSelector`, where only nodes matching the selector | ||||
| are considered for scanning. If not specified, all the nodes are going to be scanned. | ||||
|
|
||||
| To trigger a new scan, the user can set the `scanInterval` on the `NodeScanConfiguration`, | ||||
| or leave the `scanInterval` not set and apply a `NodeScanJob` manually | ||||
| (with the `NodeScanConfiguration` already applied) as we already do for the `Registry`. | ||||
|
|
||||
| Please, note that `NodeScanConfiguration` is a singleton resource, | ||||
| meaning that there can be only one instance of it in the cluster. | ||||
|
|
||||
| ## CRDs | ||||
|
|
||||
| For this feature we are going to add the following CRDs: | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please add example CRDs |
||||
|
|
||||
| * `NodeScanConfiguration`: Defines the global scan settings. | ||||
| * `scanInterval`: Duration between automated scans. | ||||
| If not specified, the `NodeScanJob` doesn't start. | ||||
| * `nodeSelector`: Filter which nodes are scanned. | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add platform(s) filter |
||||
| If not specified, all the nodes are scanned. | ||||
| * `skip`: A list of file/directory paths to be ignored. | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could use something like this for skip patterns: # Gitignore-style patterns to exclude from filesystem scans.
# Trailing "/" = directory, otherwise = file.
skipPatterns:
- "node_modules/" # → --skip-dirs node_modules
- "**/vendor/" # → --skip-dirs (glob expanded at scan time)
- ".git/" # → --skip-dirs .git
- "*.min.js" # → --skip-files *.min.js
- "package-lock.json" # → --skip-files package-lock.json |
||||
|
|
||||
| * `NodeScanJob`: Represents a single execution of a node scan. | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's reuse the same retention mechanism used for ScanJobs: keep only the latest 10 NodeScanJobs per node. sbomscanner/docs/rfc/0002_scan_trigger.md Line 314 in e5d13da
|
||||
| * `nodeName`: The name of the node to be scanned. | ||||
|
|
||||
| * `NodeSBOM`: Stores the Software Bill of Materials for a specific node. | ||||
|
|
||||
| * `NodeVulnerabilityReport`: Contains the results of the vulnerability analysis. | ||||
|
|
||||
| Here's the overview of the resources landscape: | ||||
|
|
||||
|  | ||||
|
|
||||
| ### NodeMetadata Struct | ||||
|
|
||||
| `NodeSBOM` and `NodeVulnerabilityReport` are equal to the [`SBOM`](https://github.com/kubewarden/sbomscanner/blob/main/api/storage/v1alpha1/sbom_types.go) and | ||||
| [`VulnerabilityReport`](https://github.com/kubewarden/sbomscanner/blob/main/api/storage/v1alpha1/vulnerabilityreport_types.go) resource, execept for except for [`ImageMetadata`](https://github.com/kubewarden/sbomscanner/blob/main/api/storage/v1alpha1/image_metadata.go). | ||||
|
flavio marked this conversation as resolved.
|
||||
| In this case, we are going to use the `NodeMetadata` structure to store | ||||
| information about the node. | ||||
|
|
||||
| `NodeMetadata` will have the following attributes: | ||||
|
|
||||
| * `Name` specifies the unique name of the node in the cluster. | ||||
| * `Platform` specifies the OS + CPU architecture of the node. Example: linux/amd64, linux/arm64. | ||||
|
|
||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add a section describing which reconcilers need to be introduced, along with a couple of paragraphs explaining their role in the design.
The |
||||
| ## Scan Workflow | ||||
|
|
||||
| 1. The user applies a `NodeScanConfiguration` with a defined `scanInterval` or applies a `NodeScanJob` manually. | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's revisit this and add the runner. That way we have two branches in the flow: one triggered by the user, and one triggered by the runner. |
||||
| 2. The controller creates a `NodeScanJob` for each node matching the `nodeSelector` (or all nodes if no selector is specified). | ||||
| 3. Each worker subscribes to the NATS subject `sbomscanner.nodescan.{node-name}` and receives the scan job for its node. | ||||
| 4. The worker executes the scan, generating a `NodeSBOM` and a `NodeVulnerabilityReport` for the node. | ||||
| 5. The results are stored in the cluster and can be accessed by the user for review and remediation. | ||||
|
|
||||
| To let users easily understand the flow, here's a simple diagram: | ||||
|
|
||||
|  | ||||
|
|
||||
| Without the `NodeScanConfiguration`, users can not run `NodeScanJob` independently, | ||||
| since the `NodeScanJob` needs the configurations defined in the `NodeScanConfiguration` to run (eg. the `skip` list). | ||||
|
|
||||
| When a new `NodeScanJob` is created, it checks if another `NodeScanJob` is already in progress for the same node. | ||||
| If there is an active job, the new job will be marked as `Failed` with the reason `ScanAlreadyInProgress`. | ||||
|
|
||||
|  | ||||
|
|
||||
| ## Status Conditions | ||||
|
|
||||
| NodeScanJobs will have status conditions to provide visibility into the scan process. | ||||
|
|
||||
| The `NodeScanJob` has status conditions very similar to [`ScanJob`](https://github.com/kubewarden/sbomscanner/blob/main/api/v1alpha1/scanjob_types.go#L36): | ||||
|
|
||||
| Status: `Scheduled` (The job is created but hasn't started doing actual work) | ||||
| * `Scheduled`: The system has accepted the request and scheduled it. | ||||
| * `Pending`: The job is in the queue waiting for resources or an executor to pick it up. | ||||
|
|
||||
| Status: `InProgress` (The job is actively executing) | ||||
| * `InProgress`: Generic indicator that execution has started. | ||||
| * `NodeScanInProgress`: Currently scanning the node's filesystem and collecting data. | ||||
| * `SBOMGenerationInProgress`: Currently parsing dependencies and building the SBOM document. | ||||
|
|
||||
| Status: `Complete` (The job finished successfully) | ||||
| * `Complete`: Generic success indicator. | ||||
|
alegrey91 marked this conversation as resolved.
|
||||
| * `NodeScanned`: The node has been successfully scanned, and the SBOM and vulnerability report are generated. | ||||
|
|
||||
| Status: `Failed` (The job encountered a terminal error) | ||||
| * `Failed`: Generic failure indicator (e.g., bad user input, invalid target). | ||||
| * `InternalError`: Failed due to an unexpected system crash, out-of-memory error, or infrastructure issue. | ||||
| * `ScanAlreadyInProgress`: Failed because another scan job is already running for the same node. | ||||
|
|
||||
| As for the `WorkloadScan` status conditions, the mechanism works the same. | ||||
| When `Scheduled` is `true`, then all the other conditions are `false` and their reason is `Scheduled`. | ||||
| When `Pending` is `true`, then all the other conditions are `false` and their reason is `Pending`. | ||||
| When `InProgress` is `true`, then all the other conditions are `false` and their reason is `InProgress`. | ||||
| When `Complete` is `true`, then all the other conditions are also `false` and their reason is `Complete`. | ||||
|
|
||||
| ## Garbage Collection | ||||
|
|
||||
| Garbage collection is crucial to prevent resource orphaning and to maintain a clean cluster state. | ||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are two distinct cleanup mechanisms to consider: the GC and the configuration cleanup. Kubernetes garbage collection (Node deletion) The owner reference chain is: Node → NodeScanJob, and Node → NodeSBOM → NodeVulnerabilityReport. When a Node is deleted, Kubernetes garbage collection cascades through the owner references and cleans up all node-related resources automatically. Reconciler cleanup (NodeScanConfiguration disabled/deleted) When the NodeScanConfiguration is disabled or removed, the reconciler must actively clean up NodeScanJobs and NodeSBOMs. NodeVulnerabilityReports are cascade-deleted for free since they are owned by their respective NodeSBOM. |
||||
|
|
||||
| | When Deleting | Also Delete | | ||||
| |---------------------------|----------------------------| | ||||
| | `Node` | `NodeScanJob`, `NodeSBOM`, `NodeVulnerabilityReport` | | ||||
| | `NodeScanConfiguration` | `NodeScanJob`, `NodeSBOM`, `NodeVulnerabilityReport` | | ||||
| | `NodeScanJob` | nothing | | ||||
| | `NodeSBOM` | `NodeVulnerabilityReport` | | ||||
| | `NodeVulnerabilityReport` | nothing | | ||||
|
|
||||
|  | ||||
|
|
||||
| # Drawbacks | ||||
|
|
||||
| [drawbacks]: #drawbacks | ||||
|
|
||||
| Mounting the host filesystem into a container bridges the isolation boundary and | ||||
| introduces significant risk. To mitigate potential host compromise, the `DaemonSet` | ||||
| must mount the host root filesystem as `readOnly: true`. | ||||
Uh oh!
There was an error while loading. Please reload this page.