fix: dedupe Packages with same Application type but different filepath#10789
fix: dedupe Packages with same Application type but different filepath#10789mvanhorn wants to merge 2 commits into
Conversation
|
|
|
Hello @mvanhorn Before going deeper into the current implementation, I'd like to suggest reconsidering the matching approach, because I think we already have a more reliable signal than filename heuristics. Root cause recap (from #8993 / discussion #8863): the duplication happens because the embedded SBOM (e.g. Bitnami's SPDX) reports an application whose APPLICATION package has no We already solve the structurally identical problem for JARs in for _, app := range result.Applications {
skippedFiles = append(skippedFiles, app.FilePath)
for _, pkg := range app.Packages {
// The files of those packages don't have to be analyzed.
if pkg.FilePath != "" {
skippedFiles = append(skippedFiles, pkg.FilePath)
}
}
}This works for JARs because the JAR analyzer is a The key point: the SBOM package already carries the real binary path in Could we lean on that exact
|
Replace the filename-token heuristics with the exact pkg.FilePath link between SBOM-sourced and on-disk applications, indexed by file path so coalescing no longer walks the full application map. Same-type apps that merely share a dependency now stay separate.
|
Reworked as you suggested in cc338a7. The filename heuristics (pathToken, substring matching, and the no-path-evidence fallback) are gone entirely. Coalescing now keys solely on the exact Also added the negative case you raised: two same-type apps that share a common dependency (Go stdlib) but have no path link now stay separate, with a test asserting it. Applier tests pass locally. |
Summary
Coalesces the same logical application that arrives under two different file paths (SBOM vs on-disk scan) so its packages are no longer duplicated in the merged result.
Why
The language-package merge in
pkg/fanal/applier/docker.gokeyed onapp.FilePath + "/type:" + app.Type, so the same application arriving twice with two different file paths got two distinct keys and was retained twice. Issue #8993 reports that this duplicates the application's packages in the merged scan output (for example when the same app is seen once from an SBOM and once from an on-disk scan), inflating results.Description
The language-package merge in
pkg/fanal/applier/docker.gokeyed onapp.FilePath + "/type:" + app.Type, so the same logical application arriving twice with two different file paths (for example once from an SBOM and once from an on-disk scan) got two distinct keys and was retained twice, duplicating its packages in the merged result.ApplyLayersnow routes each application throughsetApplication, which detects the same-Type/different-FilePath SBOM-vs-scanned case (shouldMergeApplications) and coalesces them (mergeApplications), preferring the scanned source and dropping byte-for-byte duplicate packages. Genuinely distinct applications keep their per-filepath behavior, so non-duplicate cases are unchanged, and OS-package and misconfiguration merging in the same loop are not touched.Related issues
Packagesreceived from SBOM + from theAnalyzerinterface #8993Related PRs
Checklist