-
Notifications
You must be signed in to change notification settings - Fork 14
Extend packages with curations info #2483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend packages with curations info #2483
Conversation
54cfb88
to
434157d
Compare
val shortestDependencyPaths: List<ShortestDependencyPath>, | ||
|
||
/** The curations for the package. */ | ||
var curations: Set<PackageCurationData> = mutableSetOf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data classes should be immutable. Use a val
here. Replace mutableSetOf
by emptySet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - changed.
.distinct() | ||
.map { curation -> | ||
packages.find { it.pkgId == curation[PackagesTable.id].value }?.let { pkg -> | ||
val packageCurations = PackageCurationDataTable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, for each element in the result set, you do another SELECT? So, this is the famous n + 1 SELECT problem?
I am not that familiar with the data structures in this area, but isn't it possible to add the curation data table to the join and do the required processing when iterating over the result set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. It is iterating only over curations found in previous step. Not over all of packages.
All that is a matter of getting more-or-less full objects of curation data. Including stuff like authors (external table), artifacts etc. So, if there is no need to retrieve it - I can down it to single query. All that depends on what data scope UI really needs to display with package.
So it's not n+1 issue to be precise.
As a general remark from my side (without having looked at your code): What I would expect happening with this PR merged is that the UI in its current state would "automatically" display the curated data. That's because IMO as a user you're rarely interested in uncurated data, so showing curated data should be the default. If at some point we decide to also show uncurated data, or the chain of curations that lead to the curated data similar to like the ORT Workbench does, then we need to specify the UI part for showing that additional data. |
@sschuberth You're right. That's how this endpoint works. It's getting all the packages found during ORT run, and all curations applied to packages found. So UI can display them automatically. Question is: what data UI want to display? Because curation data is quite wide and I'm not sure if every user needs that wide scope |
Your question seems to relate a bit to #2462. In general, I believe we should display all properties of a package / project (no matter whether such properties have curations or not). |
a890ee1
to
f1854ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is still missing is the part that applies the curations to the packages.
/** | ||
* A data class representing a package, and the shortest dependency path that the package is found in (relative to a | ||
* project found in a run). | ||
*/ | ||
data class PackageWithShortestDependencyPaths( | ||
data class PackageWithShortestDependencyPathsAndCurations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid that this naming convention does not scale, especially if more attributes will be added. Semantically, this class adds some data to a package with is relevant in the context of an ORT run. So, should it then be named something like PackageRunData
or PackageInRun
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class name changed to PackageRunData.
) | ||
} | ||
|
||
ListQueryResult(data, parameters, listQueryResult.totalCount) | ||
} | ||
|
||
private fun findCurationsForPackage(packageId: Long, curations: List<ResultRow>): List<PackageCurationData> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still concerned about the additional selects produced by this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two birds with one stone. Got rid of two n+1 problems there (for shortest dependency path and curations).
Now all of data that extends packages are retrieved in two queries (one for each curations and dependency paths), then transformed to maps and then feed into packages list.
ba39418
to
5e5ef52
Compare
In general, I have problems to understand how the changes are distributed between the commits, especially the changes on the model. So, in the first commit, the What is still missing - and this is what @sschuberth is referring to in his comment -, is to modify the package data according to the curations. A curation basically defines some modifications on the attributes of a package, e.g. to change the home page URL or override the license. The packages service is now supposed to perform these modifications, so that the UI can directly display the package properties and be sure that this is curated data. |
.select(ShortestDependencyPathsTable.columns) | ||
.where { (AnalyzerJobsTable.ortRunId eq ortRunId) and (PackagesTable.id eq pkg.pkgId) } | ||
.map { ShortestDependencyPathDao.wrapRow(it).mapToModel() } | ||
val curations = mutableMapOf<Long, List<PackageCurationData>>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to use Kotlin's [groupBy](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.collections/group-by.html)
function to construct the maps for curations and shortest paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, although not sure if it's easier to read.
5e5ef52
to
5b39794
Compare
First commit: Change |
5b39794
to
a706ec7
Compare
Something I'd like to point out here as well is that for instance the purl can be curated, and the endpoint supports filtering and sorting by purl, so if the purl in the API response will be the curated one in case there is one, this technically would need some modifications in the sorting and filtering too. |
Good point, somewhat relates to #2340, which probably should be done first. |
Do I remember correctly from one of the alignment meetings, that we agreed to merge this first, even if curated purls would not work yet? It's quite crucial for us to finally see the real data in the UI, including curations. |
Extend packages list for ORT Run by information with curations applied. Signed-off-by: Kamil Bielecki <[email protected]>
This commit extends packages list for ORT Run API endpoint with information on curations applied. resolves eclipse-apoapsis#2324 Signed-off-by: Kamil Bielecki <[email protected]>
a706ec7
to
6bd8172
Compare
Issues referenced in commit messages and issues linked to this PR are not in sync. |
We found a problem with the previous implementation, because sorting, filtering, and pagination was still applied to the uncurated data which was then inconsistent with the shown data. Therefore this had to be re-implemented to do sorting, filtering, and pagination in memory after applying the curations. This could lead to bad performance for large projects and we probably also need to adapt the database schema because of that, but we decided to first do it in memory to not delay this change even more. |
This has now been implemented in #2711. |
As UI part is not specified yet, please let me know if any additional fields are necessary to add into curation object.