fix(spdx): Check for the presence of files#11910
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #11910 +/- ##
============================================
+ Coverage 58.28% 58.50% +0.22%
- Complexity 1759 1785 +26
============================================
Files 356 356
Lines 13271 13337 +66
Branches 1319 1335 +16
============================================
+ Hits 7735 7803 +68
+ Misses 5050 5049 -1
+ Partials 486 485 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| * Convert this ORT package to an SPDX package. As an ORT package can hold more metadata about its associated artifacts | ||
| * and origin than an SPDX package, the [type] is used to specify which kind of SPDX package should be created from the | ||
| * respective ORT package metadata. | ||
| * respective ORT package metadata. Also, the list of [files] contained in the package can be provided which has an |
There was a problem hiding this comment.
commit-msg: Where did you find this in the spec?
There was a problem hiding this comment.
I have added more details to the commit message.
There was a problem hiding this comment.
Thank you!
I've read the links to the spec twice, but I cannot find any evidence, that package verification code must not be set if files are empty.
Any chance this is a bug in the verification tool?
There was a problem hiding this comment.
I have to admit that from reading the spec, I would not necessarily come to the same results as the validation tool; however, the behavior of the tool also does not contradict what I read there. One could argue that the validation tool is the ultimate authority, since it implements the "intention" of the spec in an algorithm.
There was a problem hiding this comment.
@fviernau, I am trying to better understand the possible combinations of the affected properties, so that I maybe can file an issue against the validation tool. So, there are the properties hasFiles, filesAnalyzed, and packageVerificationCode. From my experiments, the validation tool seems to correlate all of them.
What is unclear to me is the exact meaning of the hasFiles property, which seems to be deprecated. The implementation in ORT does not list all files of the package here, but IIUC, only files with findings. So, packages without findings will have an empty file list. Why is this restriction made? According to Gemini, the description of the file in the SPDX 2.1 specification is just: "Contains a list of SPDXIDs of files that are contained in the package."
There was a problem hiding this comment.
I belief, if all files were listed instead of only those with findings, this would also fix the validation errors we see: then there would always be files if a package verification code was available, and the properties would be consistent (except in the corner case of an empty package).
There was a problem hiding this comment.
H, I have a hunch there could be a misunderstanding. I just found this, which has not been discussed:
- The package verification code is computed taking into account all files in the provenance, no matter of
exclusion state, see 1. - Not all files included in the verification code are actually included as SpdxFile.
For 1 I like that the code remains stable, independent of path excludes or license detections, and that it works regardless of how fileInformationEnabled is set, as mentioned by @tsteenbe .
For 2, I do not recall exactly why not all files are included. But I believe this would blow-up the size
of the report tremendously, and if implemented should have a feature toggle.
So, could the problem be that the verification tool assumes that the packageVerificationCode must match the included SpdxFiles in the report? (And assumption is wrong?)
Footnotes
There was a problem hiding this comment.
I have found the place in the source code of the SPDX library where the related validations are done. This is not sophisticated, it actually correlates the properties in a way that is enforced by this PR. Do you think it should be different? I would be willing to file an issue, but am not sure what the desired behavior should be.
| ): SpdxPackage { | ||
| val packageVerificationCode = ortResult.getPackageVerificationCode(id, type)?.let { | ||
| SpdxPackageVerificationCode(packageVerificationCodeValue = it) | ||
| val packageVerificationCode = if (files.isNotEmpty()) { |
There was a problem hiding this comment.
commit: I wonder what kind of packages these are, which do not have an spdx file, but provenance is sccanned. Can you give an example?
There was a problem hiding this comment.
This is from an internal project which uses the SPDXDocument package manager. I don't know their exact setup and what the single packages contain.
There was a problem hiding this comment.
hm, would it be possible to share a minimal reproducer?
I'm trying to understand the scenario.
There was a problem hiding this comment.
From my understanding, packages are affected for which the scanner does not return any findings. Therefore, the list of files is empty / not present, although the package was scanned.
The SPDX specification allows certain property values for packages only if actually files are present for this package, e.g. [1, 2]. The SPDX reporter used to derive such properties from the presence of a package verification code. However, there could be cases in which such a verification code exists, but there are no files with relevant findings for the package. In such cases, validation of the generated SPDX documents with the SPDX validator tool [3] failed with a message like "Relationship error: Missing required package files for XXX". To prevent this, pass the list of files to the `toSpdxPackage()` function and evaluate it when computing the value for related properties. The new parameter also slightly simplifies the code on caller side. [1]: https://spdx.github.io/spdx-spec/v2.3/package-information/#78-files-analyzed-field [2]: https://spdx.github.io/spdx-spec/v2.3/package-information/#79-package-verification-code-field [3] https://github.com/spdx/tools-java Signed-off-by: Oliver Heger <oliver.heger@bosch.com>
c1301c4 to
8577405
Compare
tsteenbe
left a comment
There was a problem hiding this comment.
@oheger-bosch If I understand this PR correctly, packageVerificationCode will not be set when no SPDX files have been generated. Since the value of filesAnalyzed is computed as packageVerificationCode != null, that will make filesAnalyzed = false. This can cause issues for ORT users who ran Scanner but intentionally omit SPDX file-level information (e.g. set SPDX reporter option fileInformationEnabled to false) yet still want to indicate that a scan was performed by keeping filesAnalyzed = true at package level.
Can you confirm that after this PR it still possible to have filesAnalyzed = true and filled packageVerificationCode butfileInformationEnabled set to false? Think this also is a code test case.
|
@tsteenbe, IIUC, you want to set |
@oheger-bosch I want to be able to set filesAnalyzed to true, provide a |
@tsteenbe, this seems to be indeed an issue with the validation tool. According to the specification, the cardinality of the package verification code is 0..1 if filesAnalyzed is true; so it is optional. However, the tool requires that it is present if filesAnalyzed is set. I tried to set it to "0", but then there is another validation error message claiming that the number of digits is incorrect. It works, however, to set it to "0000000000000000000000000000000000000000". |
The SPDX specification allows certain property values for packages only if actually files are present for this package. The SPDX reporter used to derive such properties from the presence of a package verification code. However, there could be cases in which such a verification code exists, but there are no files with relevant findings for the package. In such cases, the generated SPDX documents were invalid according to the specification.
To prevent this, pass the list of files to the
toSpdxPackage()function and evaluate it when computing the value for related properties. The new parameter also slightly simplifies the code on caller side.