Skip to content

fix(spdx): Check for the presence of files#11910

Open
oheger-bosch wants to merge 1 commit into
oss-review-toolkit:mainfrom
boschglobal:oheger-bosch/spdx_validation_problems
Open

fix(spdx): Check for the presence of files#11910
oheger-bosch wants to merge 1 commit into
oss-review-toolkit:mainfrom
boschglobal:oheger-bosch/spdx_validation_problems

Conversation

@oheger-bosch

Copy link
Copy Markdown
Member

The SPDX specification allows certain property values for packages only if actually files are present for this package. The SPDX reporter used to derive such properties from the presence of a package verification code. However, there could be cases in which such a verification code exists, but there are no files with relevant findings for the package. In such cases, the generated SPDX documents were invalid according to the specification.

To prevent this, pass the list of files to the toSpdxPackage() function and evaluate it when computing the value for related properties. The new parameter also slightly simplifies the code on caller side.

Comment thread plugins/reporters/spdx/src/test/kotlin/ExtensionsTest.kt Dismissed
@codecov

codecov Bot commented May 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.50%. Comparing base (f1a5a62) to head (8577405).
⚠️ Report is 31 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11910      +/-   ##
============================================
+ Coverage     58.28%   58.50%   +0.22%     
- Complexity     1759     1785      +26     
============================================
  Files           356      356              
  Lines         13271    13337      +66     
  Branches       1319     1335      +16     
============================================
+ Hits           7735     7803      +68     
+ Misses         5050     5049       -1     
+ Partials        486      485       -1     
Flag Coverage Δ
funTest-external-tools 14.67% <ø> (-0.05%) ⬇️
funTest-no-external-tools 30.61% <ø> (-0.02%) ⬇️
test-ubuntu-24.04 41.87% <ø> (+0.04%) ⬆️
test-windows-2025 41.85% <ø> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

* Convert this ORT package to an SPDX package. As an ORT package can hold more metadata about its associated artifacts
* and origin than an SPDX package, the [type] is used to specify which kind of SPDX package should be created from the
* respective ORT package metadata.
* respective ORT package metadata. Also, the list of [files] contained in the package can be provided which has an

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit-msg: Where did you find this in the spec?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added more details to the commit message.

@fviernau fviernau May 29, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I've read the links to the spec twice, but I cannot find any evidence, that package verification code must not be set if files are empty.

Any chance this is a bug in the verification tool?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that from reading the spec, I would not necessarily come to the same results as the validation tool; however, the behavior of the tool also does not contradict what I read there. One could argue that the validation tool is the ultimate authority, since it implements the "intention" of the spec in an algorithm.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fviernau, I am trying to better understand the possible combinations of the affected properties, so that I maybe can file an issue against the validation tool. So, there are the properties hasFiles, filesAnalyzed, and packageVerificationCode. From my experiments, the validation tool seems to correlate all of them.

What is unclear to me is the exact meaning of the hasFiles property, which seems to be deprecated. The implementation in ORT does not list all files of the package here, but IIUC, only files with findings. So, packages without findings will have an empty file list. Why is this restriction made? According to Gemini, the description of the file in the SPDX 2.1 specification is just: "Contains a list of SPDXIDs of files that are contained in the package."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belief, if all files were listed instead of only those with findings, this would also fix the validation errors we see: then there would always be files if a package verification code was available, and the properties would be consistent (except in the corner case of an empty package).

@fviernau fviernau Jun 9, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H, I have a hunch there could be a misunderstanding. I just found this, which has not been discussed:

  1. The package verification code is computed taking into account all files in the provenance, no matter of
    exclusion state, see 1.
  2. Not all files included in the verification code are actually included as SpdxFile.

For 1 I like that the code remains stable, independent of path excludes or license detections, and that it works regardless of how fileInformationEnabled is set, as mentioned by @tsteenbe .

For 2, I do not recall exactly why not all files are included. But I believe this would blow-up the size
of the report tremendously, and if implemented should have a feature toggle.

So, could the problem be that the verification tool assumes that the packageVerificationCode must match the included SpdxFiles in the report? (And assumption is wrong?)

Footnotes

  1. https://github.com/oss-review-toolkit/ort/blob/9d51987d4c5ebd9d7d9eb03a679746403213f288/plugins/reporters/spdx/src/main/kotlin/Extensions.kt#L232-L240

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found the place in the source code of the SPDX library where the related validations are done. This is not sophisticated, it actually correlates the properties in a way that is enforced by this PR. Do you think it should be different? I would be willing to file an issue, but am not sure what the desired behavior should be.

@oheger-bosch oheger-bosch marked this pull request as ready for review May 29, 2026 12:03
@oheger-bosch oheger-bosch requested a review from a team as a code owner May 29, 2026 12:03
): SpdxPackage {
val packageVerificationCode = ortResult.getPackageVerificationCode(id, type)?.let {
SpdxPackageVerificationCode(packageVerificationCodeValue = it)
val packageVerificationCode = if (files.isNotEmpty()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit: I wonder what kind of packages these are, which do not have an spdx file, but provenance is sccanned. Can you give an example?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from an internal project which uses the SPDXDocument package manager. I don't know their exact setup and what the single packages contain.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, would it be possible to share a minimal reproducer?
I'm trying to understand the scenario.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, packages are affected for which the scanner does not return any findings. Therefore, the list of files is empty / not present, although the package was scanned.

The SPDX specification allows certain property values for packages
only if actually files are present for this package, e.g. [1, 2]. The
SPDX reporter used to derive such properties from the presence of a
package verification code. However, there could be cases in which such
a verification code exists, but there are no files with relevant
findings for the package. In such cases, validation of the generated
SPDX documents with the SPDX validator tool [3] failed with a message
like "Relationship error: Missing required package files for XXX".

To prevent this, pass the list of files to the `toSpdxPackage()`
function and evaluate it when computing the value for related
properties. The new parameter also slightly simplifies the code on
caller side.

[1]: https://spdx.github.io/spdx-spec/v2.3/package-information/#78-files-analyzed-field
[2]: https://spdx.github.io/spdx-spec/v2.3/package-information/#79-package-verification-code-field
[3] https://github.com/spdx/tools-java

Signed-off-by: Oliver Heger <oliver.heger@bosch.com>
@oheger-bosch oheger-bosch force-pushed the oheger-bosch/spdx_validation_problems branch from c1301c4 to 8577405 Compare May 29, 2026 13:27

@tsteenbe tsteenbe left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oheger-bosch If I understand this PR correctly, packageVerificationCode will not be set when no SPDX files have been generated. Since the value of filesAnalyzed is computed as packageVerificationCode != null, that will make filesAnalyzed = false. This can cause issues for ORT users who ran Scanner but intentionally omit SPDX file-level information (e.g. set SPDX reporter option fileInformationEnabled to false) yet still want to indicate that a scan was performed by keeping filesAnalyzed = true at package level.

Can you confirm that after this PR it still possible to have filesAnalyzed = true and filled packageVerificationCode butfileInformationEnabled set to false? Think this also is a code test case.

@oheger-bosch

Copy link
Copy Markdown
Member Author

@tsteenbe, IIUC, you want to set filesAnalyzed to true, provide a package verification code, but no list of files, correct? Well, this combination is flagged as invalid by the validation tool; when setting filesAnalyzed to true, a file list is obviously mandatory.
As this PR has the goal to prevent such combinations that are rejected, it is not possible in the current state. Do you think there should be a mode which allows the generation of invalid SPDX files?

@tsteenbe

tsteenbe commented Jun 3, 2026

Copy link
Copy Markdown
Member

@tsteenbe, IIUC, you want to set filesAnalyzed to true, provide a package verification code, but no list of files, correct? Well, this combination is flagged as invalid by the validation tool; when setting filesAnalyzed to true, a file list is obviously mandatory. As this PR has the goal to prevent such combinations that are rejected, it is not possible in the current state. Do you think there should be a mode which allows the generation of invalid SPDX files?

@oheger-bosch I want to be able to set filesAnalyzed to true, provide a package verification code = 0 this should be valid per the package verification code algorithm in the spec.

@oheger-bosch

Copy link
Copy Markdown
Member Author

@oheger-bosch I want to be able to set filesAnalyzed to true, provide a package verification code = 0 this should be valid per the package verification code algorithm in the spec.

@tsteenbe, this seems to be indeed an issue with the validation tool. According to the specification, the cardinality of the package verification code is 0..1 if filesAnalyzed is true; so it is optional. However, the tool requires that it is present if filesAnalyzed is set. I tried to set it to "0", but then there is another validation error message claiming that the number of digits is incorrect. It works, however, to set it to "0000000000000000000000000000000000000000".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants