Skip to content

Extend packages with curations info #2483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kamil-bielecki-bosch
Copy link

@kamil-bielecki-bosch kamil-bielecki-bosch commented Apr 10, 2025

As UI part is not specified yet, please let me know if any additional fields are necessary to add into curation object.

@kamil-bielecki-bosch kamil-bielecki-bosch force-pushed the extend-packages-with-curations-info branch from 54cfb88 to 434157d Compare April 10, 2025 12:29
val shortestDependencyPaths: List<ShortestDependencyPath>,

/** The curations for the package. */
var curations: Set<PackageCurationData> = mutableSetOf()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data classes should be immutable. Use a val here. Replace mutableSetOf by emptySet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - changed.

.distinct()
.map { curation ->
packages.find { it.pkgId == curation[PackagesTable.id].value }?.let { pkg ->
val packageCurations = PackageCurationDataTable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, for each element in the result set, you do another SELECT? So, this is the famous n + 1 SELECT problem?

I am not that familiar with the data structures in this area, but isn't it possible to add the curation data table to the join and do the required processing when iterating over the result set?

Copy link
Author

@kamil-bielecki-bosch kamil-bielecki-bosch Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. It is iterating only over curations found in previous step. Not over all of packages.
All that is a matter of getting more-or-less full objects of curation data. Including stuff like authors (external table), artifacts etc. So, if there is no need to retrieve it - I can down it to single query. All that depends on what data scope UI really needs to display with package.
So it's not n+1 issue to be precise.

@sschuberth
Copy link
Contributor

As UI part is not specified yet, please let me know if any additional fields are necessary to add into curation object.

As a general remark from my side (without having looked at your code): What I would expect happening with this PR merged is that the UI in its current state would "automatically" display the curated data. That's because IMO as a user you're rarely interested in uncurated data, so showing curated data should be the default.

If at some point we decide to also show uncurated data, or the chain of curations that lead to the curated data similar to like the ORT Workbench does, then we need to specify the UI part for showing that additional data.

@kamil-bielecki-bosch
Copy link
Author

@sschuberth You're right. That's how this endpoint works. It's getting all the packages found during ORT run, and all curations applied to packages found. So UI can display them automatically. Question is: what data UI want to display? Because curation data is quite wide and I'm not sure if every user needs that wide scope

@sschuberth
Copy link
Contributor

Question is: what data UI want to display? Because curation data is quite wide and I'm not sure if every user needs that wide scope

Your question seems to relate a bit to #2462. In general, I believe we should display all properties of a package / project (no matter whether such properties have curations or not).

@kamil-bielecki-bosch kamil-bielecki-bosch force-pushed the extend-packages-with-curations-info branch 7 times, most recently from a890ee1 to f1854ac Compare April 11, 2025 21:02
Copy link
Contributor

@oheger-bosch oheger-bosch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is still missing is the part that applies the curations to the packages.

/**
* A data class representing a package, and the shortest dependency path that the package is found in (relative to a
* project found in a run).
*/
data class PackageWithShortestDependencyPaths(
data class PackageWithShortestDependencyPathsAndCurations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that this naming convention does not scale, especially if more attributes will be added. Semantically, this class adds some data to a package with is relevant in the context of an ORT run. So, should it then be named something like PackageRunData or PackageInRun ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class name changed to PackageRunData.

)
}

ListQueryResult(data, parameters, listQueryResult.totalCount)
}

private fun findCurationsForPackage(packageId: Long, curations: List<ResultRow>): List<PackageCurationData> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still concerned about the additional selects produced by this function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two birds with one stone. Got rid of two n+1 problems there (for shortest dependency path and curations).
Now all of data that extends packages are retrieved in two queries (one for each curations and dependency paths), then transformed to maps and then feed into packages list.

@kamil-bielecki-bosch kamil-bielecki-bosch force-pushed the extend-packages-with-curations-info branch 3 times, most recently from ba39418 to 5e5ef52 Compare April 14, 2025 14:34
@kamil-bielecki-bosch kamil-bielecki-bosch marked this pull request as ready for review April 15, 2025 06:39
@oheger-bosch
Copy link
Contributor

In general, I have problems to understand how the changes are distributed between the commits, especially the changes on the model. So, in the first commit, the PackageRunData is renamed and extended, but the mapping of the new attribute and the corresponding API class are introduced in the second commit.
Maybe it is more logical to do the model changes in a separate first commit.

What is still missing - and this is what @sschuberth is referring to in his comment -, is to modify the package data according to the curations. A curation basically defines some modifications on the attributes of a package, e.g. to change the home page URL or override the license. The packages service is now supposed to perform these modifications, so that the UI can directly display the package properties and be sure that this is curated data.

.select(ShortestDependencyPathsTable.columns)
.where { (AnalyzerJobsTable.ortRunId eq ortRunId) and (PackagesTable.id eq pkg.pkgId) }
.map { ShortestDependencyPathDao.wrapRow(it).mapToModel() }
val curations = mutableMapOf<Long, List<PackageCurationData>>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to use Kotlin's [groupBy](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.collections/group-by.html) function to construct the maps for curations and shortest paths.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, although not sure if it's easier to read.

@kamil-bielecki-bosch kamil-bielecki-bosch force-pushed the extend-packages-with-curations-info branch from 5e5ef52 to 5b39794 Compare April 15, 2025 11:53
@kamil-bielecki-bosch
Copy link
Author

In general, I have problems to understand how the changes are distributed between the commits, especially the changes on the model. So, in the first commit, the PackageRunData is renamed and extended, but the mapping of the new attribute and the corresponding API class are introduced in the second commit.
Maybe it is more logical to do the model changes in a separate first commit.

First commit: Change PackageWithShortestDependencyPaths -> PackageRunData, due to new field used to return curations with package. As name changed, changes to Mappings class applied. But without changes to output API model Package, due to API endpoints contract.
Second commit: Add new, already available field to API Package model, contracts extension.
IMHO moving just name change to separate commit is a bit of overkill.

@kamil-bielecki-bosch kamil-bielecki-bosch force-pushed the extend-packages-with-curations-info branch from 5b39794 to a706ec7 Compare April 15, 2025 12:20
@lamppu
Copy link
Contributor

lamppu commented Apr 15, 2025

What is still missing - and this is what @sschuberth is referring to in his comment -, is to modify the package data according to the curations. A curation basically defines some modifications on the attributes of a package, e.g. to change the home page URL or override the license. The packages service is now supposed to perform these modifications, so that the UI can directly display the package properties and be sure that this is curated data.

Something I'd like to point out here as well is that for instance the purl can be curated, and the endpoint supports filtering and sorting by purl, so if the purl in the API response will be the curated one in case there is one, this technically would need some modifications in the sorting and filtering too.

@sschuberth
Copy link
Contributor

Something I'd like to point out here as well is that for instance the purl can be curated

Good point, somewhat relates to #2340, which probably should be done first.

@sschuberth
Copy link
Contributor

Something I'd like to point out here as well is that for instance the purl can be curated

Good point, somewhat relates to #2340, which probably should be done first.

Do I remember correctly from one of the alignment meetings, that we agreed to merge this first, even if curated purls would not work yet?

It's quite crucial for us to finally see the real data in the UI, including curations.

Kamil Bielecki added 2 commits April 30, 2025 13:24
Extend packages list for ORT Run by information with
curations applied.

Signed-off-by: Kamil Bielecki <[email protected]>
This commit extends packages list for ORT Run API endpoint
with information on curations applied.

resolves eclipse-apoapsis#2324

Signed-off-by: Kamil Bielecki <[email protected]>
@kamil-bielecki-bosch kamil-bielecki-bosch force-pushed the extend-packages-with-curations-info branch from a706ec7 to 6bd8172 Compare April 30, 2025 11:24
Copy link

Issues referenced in commit messages and issues linked to this PR are not in sync.
Please manually link this PR to the following issues: 2324

@kamil-bielecki-bosch kamil-bielecki-bosch marked this pull request as draft April 30, 2025 11:28
@mnonnenmacher
Copy link
Contributor

Something I'd like to point out here as well is that for instance the purl can be curated

Good point, somewhat relates to #2340, which probably should be done first.

Do I remember correctly from one of the alignment meetings, that we agreed to merge this first, even if curated purls would not work yet?

It's quite crucial for us to finally see the real data in the UI, including curations.

We found a problem with the previous implementation, because sorting, filtering, and pagination was still applied to the uncurated data which was then inconsistent with the shown data. Therefore this had to be re-implemented to do sorting, filtering, and pagination in memory after applying the curations. This could lead to bad performance for large projects and we probably also need to adapt the database schema because of that, but we decided to first do it in memory to not delay this change even more.

@mnonnenmacher mnonnenmacher self-assigned this May 12, 2025
@mnonnenmacher
Copy link
Contributor

This has now been implemented in #2711.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants