-
Notifications
You must be signed in to change notification settings - Fork 14
feat(scanner): Add submodule fetch strategy for nested repositories #2679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(scanner): Add submodule fetch strategy for nested repositories #2679
Conversation
cc97a2e
to
f068fce
Compare
@@ -85,7 +87,20 @@ class ScannerRunner( | |||
?: listOf(SourceCodeOrigin.ARTIFACT, SourceCodeOrigin.VCS) | |||
) | |||
|
|||
val workingTreeCache = DefaultWorkingTreeCache() | |||
// If the submodule fetch strategy is set to TOP_LEVEL_ONLY, for git use a plugin config that prevents that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some way to test this? Maybe with a constructor mock and a verification that the Git-specific plugin options have actually been set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oheger-bosch This additional functionality was implemented as part of DefaultWorkingTreeCache
, which is a class of ORT and not ORT Server. If it is to be tested, the test should be in ORT and not in ORT Server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about testing that the correct options are passed when doing the checkout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oheger-bosch I tried it out to add a test, but the functionality to add the right parameters to the git submodule update
command is buried deep into the VCS implementation (in this case: Git), and also just querying the plugin configuration of the Git (VCS) instance is not possible, as the configuration is a private member. And all of this is happing in ORT, and not ORT Server. Whatever, the functionality in the Git (VCS) is the same as it is already used by the Analyzer parameter submoduleFetchStrategy=TOP_LEVEL_ONLY
so it is quite sure that the appropriate git submodule update
command is used when the repository is checked out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not go so far to test the VCS logic in ORT. What I mean is, there is the line
val workingTreeCache = DefaultWorkingTreeCache().addVcsPluginConfigs(vcsPluginConfigs)
in ScannerRunner
. When using a constructor mock for DefaultWorkingTreeCache
, it should be possible to verify that addVcsPluginConfigs
was called on the object and to inspect the passed configuration.
f068fce
to
bacaadb
Compare
bacaadb
to
8023d6d
Compare
8023d6d
to
321ee16
Compare
@mnonnenmacher I extended the commit message including a statement about the remaining small risk of this solution. |
/** | ||
* Specifies how submodules are fetched when resolving provenances. Currently supported only for Git repositories. | ||
* If set to [SubmoduleFetchStrategy.FULLY_RECURSIVE] (default), all submodules are fetched recursively. If set | ||
* to [SubmoduleFetchStrategy.TOP_LEVEL_ONLY], only the top-level submodules are fetched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: docs are missing SubmoduleFetchStrategy.DISABLED. But to me different options don't need to be explained here at all as they are already documented in the enum.
} else { | ||
emptyMap() | ||
} | ||
val canonicalVcsPluginConfigs = createCanonicalVcsPluginConfigs(vcsPluginConfigs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line belongs to the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But IntelliJ only allowed me to check the code block as a whole.
321ee16
to
2a74519
Compare
Introduce `submoduleFetchStrategy` config to control how the Scanner fetches Git submodules. When set to `TOP_LEVEL_ONLY`, only top-level submodules are fetched, avoiding timeouts on deeply nested repos. This mirrors the behavior already available in the Analyzer and allows to resolve nested provenances even in this kind of repositories with a vast amount of nested submodules. Signed-off-by: Wolfgang Klenk <[email protected]>
If particular VCS plugin configurations are active during a scan (like `submoduleFetchStrategy=TOP_LEVEL_ONLY`), ensure that VCS plugin configurations are stored alongside nested provenance data. This prevents reuse of cache entries across scans with differing VCS plugin settings, ensuring correctness and reliability of scan results. However, there remains a small risk that other dependencies are not fully resolved in such a scenario where the WorkingTreeCache is limited to the first level of submodules. But this is rather small, because other open source dependencies typically don't use nested submodules at all. Signed-off-by: Wolfgang Klenk <[email protected]>
2a74519
to
33f456a
Compare
Introduce
submoduleFetchStrategy
config to control how the Scanner fetches Git submodules. When set toTOP_LEVEL_ONLY
, only top-level submodules are fetched, avoiding timeouts on deeply nested repos.This mirrors the behavior already available in the Analyzer and allows to resolve nested provenances even in this kind of repositories with a vast amount of nested submodules.
If activated, in the logs you will no longer see the
--recursive
flag in thegib submodule update
command then:If particular VCS plugin configurations are active during a scan (like
submoduleFetchStrategy=TOP_LEVEL_ONLY
),ensure that VCS plugin configurations are stored alongside nested provenance data. This prevents reuse of cache entries across scans with differing VCS plugin settings, ensuring correctness and reliability of scan results.
For storing the VCS plugin configuration, a canonical string like
Git/updateNestedSubmodules/false
is created and stored alogside with the nested provenance data.