docs: node scan rfc#922
Conversation
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
There was a problem hiding this comment.
Pull request overview
Adds an RFC describing the proposed “Node Scan” feature, outlining how node scanning should work and integrate with SBOMscanner, including intended CRDs and status conditions.
Changes:
- Introduces a new RFC document for Node Scanning.
- Describes the DaemonSet-based architecture and reuse of the worker via scan mode flagging.
- Proposes new CRDs, a
NodeMetadatastructure, andNodeScanJobstatus conditions.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #922 +/- ##
==========================================
+ Coverage 49.95% 52.22% +2.26%
==========================================
Files 56 61 +5
Lines 4544 5147 +603
==========================================
+ Hits 2270 2688 +418
- Misses 1928 2071 +143
- Partials 346 388 +42
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Flavio Castelli <flavio@castelli.me>
flavio
left a comment
There was a problem hiding this comment.
This is a good start. There are details that are not covered, such as:
- The daemonset will stay idle most of the time
- How is the scan initiated
- How is the SBOM performed. We both know this is going to be based on trivy scanning the filesystem of the host, mounted into the container. Explain that. Also, we will probably need to provide a way to exclude some host directories from the scan. AFAIK there's the risk of trivy scanning the directories where the container root filesystem is splatted, leading to assigning CVEs of the container images to the host itself.
- Alternative approach: we could find creative way to spin up and down the daemonsets on demand. But this something that requires more work compared to the initial proposal. We expect the agent to be idle most of the time, consuming very little resources. We can revisit the approach later on, when we have real data.
Co-authored-by: Flavio Castelli <flavio@castelli.me> Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
You are right. The field |
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
|
|
||
| ## Garbage Collection | ||
|
|
||
| Garbage collection is crucial to prevent resource orphaning and to maintain a clean cluster state. |
There was a problem hiding this comment.
There are two distinct cleanup mechanisms to consider: the GC and the configuration cleanup.
Kubernetes garbage collection (Node deletion)
The owner reference chain is: Node → NodeScanJob, and Node → NodeSBOM → NodeVulnerabilityReport. When a Node is deleted, Kubernetes garbage collection cascades through the owner references and cleans up all node-related resources automatically.
Reconciler cleanup (NodeScanConfiguration disabled/deleted)
When the NodeScanConfiguration is disabled or removed, the reconciler must actively clean up NodeScanJobs and NodeSBOMs. NodeVulnerabilityReports are cascade-deleted for free since they are owned by their respective NodeSBOM.
| If not specified, all the nodes are scanned. | ||
| * `skip`: A list of file/directory paths to be ignored. | ||
|
|
||
| * `NodeScanJob`: Represents a single execution of a node scan. |
There was a problem hiding this comment.
Let's reuse the same retention mechanism used for ScanJobs: keep only the latest 10 NodeScanJobs per node.
See:
sbomscanner/docs/rfc/0002_scan_trigger.md
Line 314 in e5d13da
|
|
||
| * `Name` specifies the unique name of the node in the cluster. | ||
| * `Platform` specifies the OS + CPU architecture of the node. Example: linux/amd64, linux/arm64. | ||
|
|
There was a problem hiding this comment.
Please add a section describing which reconcilers need to be introduced, along with a couple of paragraphs explaining their role in the design.
NodeScanJobReconcilerNodeScanReconcilerNodeScanRunner(as a runnable)
The NodeScanRunner filters nodes using the nodeSelector and platform criteria. The platform filter can be implemented using controller-runtime indexes.
|
|
||
| ## CRDs | ||
|
|
||
| For this feature we are going to add the following CRDs: |
There was a problem hiding this comment.
please add example CRDs
| * `NodeScanConfiguration`: Defines the global scan settings. | ||
| * `scanInterval`: Duration between automated scans. | ||
| If not specified, the `NodeScanJob` doesn't start. | ||
| * `nodeSelector`: Filter which nodes are scanned. |
There was a problem hiding this comment.
add platform(s) filter
| If not specified, the `NodeScanJob` doesn't start. | ||
| * `nodeSelector`: Filter which nodes are scanned. | ||
| If not specified, all the nodes are scanned. | ||
| * `skip`: A list of file/directory paths to be ignored. |
There was a problem hiding this comment.
We could use something like this for skip patterns:
# Gitignore-style patterns to exclude from filesystem scans.
# Trailing "/" = directory, otherwise = file.
skipPatterns:
- "node_modules/" # → --skip-dirs node_modules
- "**/vendor/" # → --skip-dirs (glob expanded at scan time)
- ".git/" # → --skip-dirs .git
- "*.min.js" # → --skip-files *.min.js
- "package-lock.json" # → --skip-files package-lock.json|
|
||
| ## Scan Workflow | ||
|
|
||
| 1. The user applies a `NodeScanConfiguration` with a defined `scanInterval` or applies a `NodeScanJob` manually. |
There was a problem hiding this comment.
Let's revisit this and add the runner. That way we have two branches in the flow: one triggered by the user, and one triggered by the runner.
Signed-off-by: Alessio Greggi <alessio.greggi@suse.com>
Description
Node Scan feature RFC.
Fix #889
Test
To test this pull request, you can run the following commands:
Additional Information
Tradeoff
Potential improvement