Description
I have been experiencing issues in bom findsources
with capycli's GitHub interaction. Jobs take unexpectedly long and the memory consumption is correspondingly high (but isn't an issue in itself). I use capycli to process relatively large BOMs and, according to capycli's findings, I frequently have 400-500 third party components from GitHub.
I tracked these issues to how findsources maps component versions to tags on GitHub. Currently, capycli first retrieves the full list of a project's tags (get_github_info()
in capycli.bom.findsources) and then iterates over this list, hoping to find a match to the version provided as a parameter to get_matching_tag()
.
There are projects like the tencentcloud sdk with tens of thousands of tags. Using the GitHub API, capycli has to retrieve these at chunks of 100 tags per call using Python's synchronous IO.
On average, get_matching_tag()
does 109 negative comparisons for each tag it matches. This means on average in my use cases capycli has to fetch two pages worth of tags to match a component. This is amounts to retrieving tencentcloud sdk alone.
As far as I can tell, ...
get_github_info()
is only ever used twice with both occurrences incapycli.bom.findsources
. Both uses virtually directly feed intoget_matching_tag()
.get_matching_tag()
is only ever used three times with all occurrences incapycli.bomfindsources
. All uses are essentially immediatelyreturn
-ed
Are there any uses of these methods I missed?