Description
Currently, scanning in ORT is package-based, and project-packages are identified by "definition files" (like "pom.xml", "build.gradle", "package.json" etc.) in the directory tree. So all files and directories below a definition file are regarded as belonging to the project that the definition file defines. See:
ROOTDIR
|
+-SUBPROJDIR_A
| |
| +-pom-a.xml
| |
| +-license-a.txt
|
+-SUBPROJDIR_B
| |
| +-pom-b.xml
| |
| +-license-b.txt
|
+-WEBDIR
| |
| +-package.json
| |
| +-license-w.txt
|
+-pom.xml
|
+-license.txt
So, the project spanned by pom.xml
in the root directory is considered to "own" all files below the root directory, including SUBPROJDIR_A/license-a.txt
, SUBPROJDIR_B/license-b.txt
and WEBDIR/license-w.txt
. This means that scanner findings in those file will get associated to the root project.
However, when the scanner's view shifts to the projects in the subdirectories, the project spanned by SUBPROJDIR_A/pom-a.xml
also gets the scan result for SUBPROJDIR_A/license-a.txt
assigned (similar for the other subprojects).
This is historically so because ORT not really understand the semantics of a project's directory tree. However, the result can be really confusing, as scan findings (and potential violations) might show up multiple times in the reports, although they all stem from the same single file.
As a solution to this, one idea is to associate files always only to the nearest enclosing project when walking up the directory tree to the root. Maybe this logic should be limited to projects of the same type; however, in the example above this would result in the scanner findings from WEBDIR/license-w.txt
to still be associated to the root project spanned by pom.xml
.
I have some hopes that the required filtering logic would be easier to implement once #2668 is merged, as it implements some similar filtering to associate provenance-based scan results to individual packages IIUC.