feat(ort-project-file): Disable scan when no sources are available#11958
feat(ort-project-file): Disable scan when no sources are available#11958MarcelBochtler wants to merge 1 commit into
Conversation
That sounds a bit as if the mechanism introduced in #11688 would be a better fit, or? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #11958 +/- ##
=========================================
Coverage 58.44% 58.44%
Complexity 1811 1811
=========================================
Files 361 361
Lines 13504 13504
Branches 1385 1385
=========================================
Hits 7893 7893
Misses 5115 5115
Partials 496 496
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
8b022dd to
ee2a8f1
Compare
Yes, this is way simpler. Thanks :) |
| labels = labels, | ||
| isModified = isModified ?: false, | ||
| isMetadataOnly = isMetadataOnly ?: false | ||
| isMetadataOnly = isMetadataOnly ?: false, |
There was a problem hiding this comment.
I believe this can easily hide mistakes, e.g. user forgets to specify the provenance.
I would prefer instead to add an attribute sourceCodeOrigins which the user can specify. What do you think?
side note: if only one of sourceArtifact or vcs is specified, sourceCodeOrigins still contains both (which is fine imo), but now slightly inconsistent.
There was a problem hiding this comment.
I would prefer instead to add an attribute sourceCodeOrigins which the user can specify.
What should the default of this attribute be? If it's an emptyList() I'm fine with it, if it would be null this wouldn't solve the issue that I'm trying to solve with this PR.
Having dependencies without provenance is by design as described here in the minimal example. If the user is not specifying the provenance intentionally or accidentally can't be known by ORT.
Additionally, adding a sourceCodeOrigins attribute would also require us to handle something like:
- purl: "pkg:maven/org.apache.commons/commons-lang3@3.14.0"
sourceArtifact:
url: "https://repo1.maven.org/maven2/org/apache/commons/commons-lang3/3.14.0/commons-lang3-3.14.0-sources.jar"
sourceCodeOrigins: VCSWhich could only result in an error.
I'm not sure what benefit this attribute would provide.
If a user wants to prioritize VCS over sourceArtifact (or vice versa), they would only configure one of them.
There was a problem hiding this comment.
What should the default of this attribute be? If it's an
emptyList()I'm fine with it, if it would benullthis wouldn't solve the issue that I'm trying to solve with this PR.
The default value imho should be null, analog to the default value of Package for consistency.
The semantics of null means, that the value configured here should be used.
Having dependencies without provenance is by design as described #10182 (comment). If the user is not specifying the provenance intentionally or accidentally can't be known by ORT.
An attempt to run the license scanner against a package without provenance fails by design, if not
explicitly curated in a way that the scanner skips the scanning. Why should it be different for OrtProjectFile?
Additionally, adding a sourceCodeOrigins attribute would also require us to handle something like:
I believe no handling would need to be added, and ORT's behavior as-is fits: The scanner would report an issue that it could not scan any provenance, which makes sense, because it is configured to try VCS only, for which it does not know the provenance.
I'm not sure what benefit this attribute would provide.
In my view, this would fit consistently into the existing design.
If in the downloader configuration you specify [VCS, SOURCE_ARTIFACT], then it would work for you also without specifying any sourceCodeOrigins for the package, or?
There was a problem hiding this comment.
An attempt to run the license scanner against a package without provenance fails by design, if not
explicitly curated in a way that the scanner skips the scanning. Why should it be different forOrtProjectFile?
Because it was decided in #10182 and documented that this should be the minimal working example:
dependencies:
- purl: "pkg:maven/com.example/minimal@0.1.0"It was not my understanding that this minimal example will only work with the ORT analyzer.
When used with the ORT Scanner, this example cannot be used because it will create a scanner issue, if the user is not specifying any of the attributes below:
- VCS
- sourceArtifact
- isMetadataOnly
What would be your approach to make the minimum example work?
If in the downloader configuration you specify
[VCS, SOURCE_ARTIFACT], then it would work for you also without specifying anysourceCodeOriginsfor the package, or?
Not if I understand you correctly, because I don't want the scanner to try to download anything if the minimal example is being used.
There was a problem hiding this comment.
Because it was decided in #10182 and documented
Would have have a more fine grained pointer. I've skimmed through this rather large ticket and could
not find that decision.
What would be your approach to make the minimum example work?
My proposal is to enhance the minimal example for running (analyzer + scanner) as follows:
dependencies:
- purl: "pkg:maven/com.example/minimal@0.1.0"
source_code_origins: []
For just running analyzer, it could remain as-is.
I do not really understand why the scanner is run is no provenance is defined, and why the scanner should
not flag missing provenance, only for ORT project files but in all other cases.
There was a problem hiding this comment.
Just a remark: The whole purpose of this pseudo package manager is to make it really simple for users to describe their dependencies. Although it is called OrtProjectFile, it may be therefore useful to come up with a DSL that better supports this goal. From a user's point of view, the proposed source_code_origins property and its consequences may not be easy to understand. Could there be a simpler way to indicate whether the package should be handled by the scanner? Just throwing in ideas:
- There could be a
sourceproperty with possible values like a VCS URL, an artifact URL, and some reserved value likeIGNORE. - There could be a different indicator that the definition file uses a minimum syntax, maybe based on file naming conventions. In this case, different default values could be used for some properties.
There was a problem hiding this comment.
Would have have a more fine grained pointer. I've skimmed through this rather large ticket and could
not find that decision.
Yes, sorry.
The decision was made in a TSC meeting linking to a specific comment.
https://github.com/oss-review-toolkit/ort-governance/blob/main/minutes/2025-12-02.md#agenda
Overall format as outlined by #10182 (comment) is agreed (with above modifications). Only tiny stuff left for discussion in the PR.
My proposal is to enhance the minimal example for running (analyzer + scanner) as follows:
dependencies: - purl: "pkg:maven/com.example/minimal@0.1.0" source_code_origins: []
Requiring that a user sets an empty list to specify that something is absent makes this more complex than it has to be. I would like to avoid this just to guard against "users that might forget to set the provenance".
I do not really understand why the scanner is run is no provenance is defined
Because the file format is designed to support both, dependencies with provenances and dependencies without it. My PR changes exactly that. The scanner is (implicitly) skipped if no provenance is set.
and why the scanner should not flag missing provenance, only for ORT project files but in all other cases.
Because when designing this format, we also had the use-case in mind that projects like this have to be supported:
dependencies:
- purl: "pkg:maven/com.example/with-vcs@1.1.0"
description: "Package with VCS defined, we expect a detected license to be present."
vcs:
type: "Git"
url: "https ://git.example.com/with-vcs/"
revision: "main"
- purl: "pkg:maven/com.example/declared-only@1.1.0"
description: |
Package without any provenance, we only expect a declared license to be available.
declaredLicenses:
- "MIT"
- purl: "pkg:maven/com.example/minimal@1.1.0"
description: |
Package without any provenance and license information.
This might be used for self written packages that are scanned elsewhere,
or for proprietary packages where no source code is available.
declaredLicenses:
- "LicenseRef-proprietary"For other package managers a missing provenance should be flagged, because the metadata is not specified by ORT users but by maintainers of the (OSS) dependencies.
There was a problem hiding this comment.
The decision was made in a TSC meeting linking to a specific comment.
https://github.com/oss-review-toolkit/ort-governance/blob/main/minutes/2025-12-02.md#agenda
I still cannot find an indication that silencing failing provenance resoltion was considered in that TSC discussion.
Requiring that a user sets an empty list to specify that something is absent
Well, we have the DownloaderConfiguration, and allows to configure sourceCodeOrigins. And semantics is, that the configured value is used by the downloader / scanner, unless an override is specified. So, as you phrased it above is imho misleading, because it sounds as if the user has to specify something explicitly, which is emtpy implicitly anyway, but I believe this is not the case.
Because the file format is designed to support both, dependencies with provenances and dependencies without it.
Just to make clear, with my above proposal I did not mean to add any limitation at all.
To my understanding, and my intention was, to still support all kinds of use cases.
In particular, also the use case to have the scanner flag issues in case there is no provenance, and not
to make this go unnoticed by default.
I'm not 100% sure if there is a misunderstanding still, or if we simply disagree. However, I would be interrested also in input from others.
There was a problem hiding this comment.
In my understanding the "minimal working example" that was discussed in the design phase was referring to the package manager itself, not that any subsequent tools like the scanner could not create issues.
Silently ignoring the package in the scanner if no source code location was provided also has the risk that users get the false impression that the license detection did not find any issues, so I would also prefer that users have to make this decision consciously. Otherwise, it could also be that they have just forgotten to provide the source code location and will never notice because no warning of error is shown.
If source_code_origins: [] is too unintuitive, what about adding a more speaking property like skip_scanner: true or source_code_available: false which is then internally mapped to source_code_origins: []?
Just for reference, neither of the proposed solutions follows ORT's philosophy that users should provide a reason why it is ok to ignore this package in the scanner like they would have when creating an issue resolution or a curation.
A dependency declared in an OrtProject file without a sourceArtifact or VCS entry has no source code that can be retrieved. The scanner can only determine a provenance if at least one of these is provided. If neither is present, the scanner cannot determine a provenance and emits an issue for every such package. Set sourceCodeOrigins to an empty list if none of these is given to skip the scan of these packages. Signed-off-by: Marcel Bochtler <marcel.bochtler@bosch.com>
ee2a8f1 to
3483030
Compare
See commit for details