Version summaries enhancements and pagination #11859

GPortas · 2025-10-01T16:15:33Z

What this PR does / why we need it:

This pull request introduces pagination mechanisms for the dataset and datafile version summary/difference endpoints.

While the initial scope was limited to adding pagination, a preliminary investigation revealed significant underlying issues that would have made a direct implementation inefficient and unsustainable. The core problems identified were:

High Coupling and Poor Separation of Concerns: The existing code lacked clear architectural layering and encapsulation for these use cases, leading to tightly coupled components that are difficult to maintain and extend.
Severe Performance Bottlenecks: The implementation relied on fetching bulk data from the database and then post-processing it in Java using multiple nested loops. This approach caused significant performance degradation, especially for datasets or files with a large number of versions.
Low Test Coverage: The lack of a comprehensive test suite made it risky to extend/alter the existing functionality without introducing regressions.

Changes Made
Given the issues discovered, the decision was made to perform a comprehensive, end-to-end refactoring of these features. This ensures that the new pagination functionality is built upon a robust, performant, and maintainable foundation.

The key changes include:

Architectural Realignment: The entire workflow, from the API endpoint to the data access layer, has been refactored to align with the established Dataverse architecture using Commands and Services. This improves modularity and clarifies responsibilities within the code.
Performance Optimization with JPA Criteria: All data processing has been pushed down to the database layer. In-memory processing with loops has been replaced with specific, performant JPA Criteria-based queries. This dramatically improves response times for entities with extensive version histories.
Improved Test Coverage: New unit tests have been introduced to cover the refactored code, addressing critical logic that was previously untested. This ensures the stability of the new implementation and simplifies future development.

The intermittent 500 errors that were occurring and reported in #11561
have also been resolved. These errors were caused by a null pointer exception resulting from null datafiles. Below, the error trace is shown, followed by a screenshot of the error being reproduced, and then another screenshot taken after deploying this PR branch.

Reproduced error:

Fixed:

Which issue(s) this PR closes:

Sharing some thoughts...:

Let's keep our focus on making sure our codebase stays healthy and easy to work with.

Sticking to our layered architecture is key. It's what makes it way easier for everyone to jump in and make changes without breaking things.

Let's also be serious about our tests. Good unit tests prove the little pieces work, and API tests make sure the whole thing hangs together. Both are necessary.

Finally, let's all try to follow the 'campsite rule' with tech debt: leave the code a little cleaner than you found it. If you spot something you can quickly improve, do it. It's a small effort that saves us from huge headaches down the road.

Suggestions on how to test this:

Performance enhancements could be tested by running this branch on an installation with a dataset or file with a large number of versions, and calling the endpoint below without pagination, and compare the response time with develop.

You can control pagination of the results using the following optional query parameters.

limit: The maximum number of version differences to return.
offset: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.

For example, to get the second page of results, with 2 items per page, you would use limit=2 and offset=2 (skipping the first two results).

For datasets:

curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z&limit=2&offset=2"

For files:

curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences?limit=2&offset=2"

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No

Is there a release notes update needed for this change?:

Yes, attached.

…eBean.findVersions

…n-summaries-pagination

…ry API endpoint

…API IT

…n-summaries-pagination

…rviceBean

… test-covered

…onDifferencesCommand

…tFileVersionDifferencesCommand refactored

…n-summaries-pagination

…leVersionDifferencesCommandTest

…ionsDifferencesAPI IT

…n-summaries-pagination

…sionDifferencesCommand

github-actions · 2025-10-14T22:40:01Z

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11855-version-summaries-pagination

ghcr.io/gdcc/configbaker:11855-version-summaries-pagination

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

GPortas added 2 commits October 1, 2025 17:15

Stash: refactoring getCompareVersionsSummary endpoint WIP

50a321f

Refactor: getCompareVersionsSummary and related layers

eeb6e33

This comment has been minimized.

Sign in to view

GPortas added 3 commits October 2, 2025 18:51

Changed: GetDatasetVersionSummariesCommand using DatasetVersionServic…

98c72cb

…eBean.findVersions

Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…

4685a51

…n-summaries-pagination

Added: handling pagination optional params on getCompareVersionsSumma…

5919cf4

…ry API endpoint

This comment has been minimized.

Sign in to view

GPortas added 2 commits October 2, 2025 19:14

Added: DatasetVersionSummaryTest

ea5642e

Added: GetDatasetVersionSummariesCommandTest

b3c4fbd

GPortas force-pushed the 11855-version-summaries-pagination branch from 943da66 to b3c4fbd Compare October 2, 2025 18:21

This comment has been minimized.

Sign in to view

GPortas added 3 commits October 6, 2025 09:33

Added: pagination test cases to testSummaryDatasetVersionsDifferences…

89b88b5

…API IT

Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…

a19e7c9

…n-summaries-pagination

Added: pagination explanation docs to compareSummary datasets endpoint

93823a7

This comment has been minimized.

Sign in to view

GPortas changed the title ~~Version summaries pagination~~ Version summaries enhancements and pagination Oct 6, 2025

GPortas mentioned this pull request Oct 6, 2025

Dataset compareSummary API: version summary should show tags changes #11663

Open

GPortas added 5 commits October 7, 2025 23:12

Added: findFileMetadataHistory JPACriteria-based method to DataFileSe…

393c7d4

…rviceBean

Added: GetFileVersionDifferencesCommand, pending to be refactored and…

7ac4588

… test-covered

Changed: Files API versionDifferences endpoint now using GetFileVersi…

bda189f

…onDifferencesCommand

Added: VersionedFileMetadata class

7973153

Added: handling optional pagination in findFileMetadataHistory and Ge…

a3b753b

…tFileVersionDifferencesCommand refactored

This comment has been minimized.

Sign in to view

Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…

493aa38

…n-summaries-pagination

GPortas added 2 commits October 12, 2025 19:08

Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…

4807c53

…n-summaries-pagination

Added: simple javadoc with TODO to FileVersionDifferenceJsonPrinter

79094a7

This comment has been minimized.

Sign in to view

GPortas added 4 commits October 13, 2025 12:48

Added: pagination params to getFileVersionsList API endpoint

0f71d6b

Added: test cases producing InvalidCommandArgumentsException to GetFi…

6c1da73

…leVersionDifferencesCommandTest

Added: pagination params validation to GetDatasetVersionSummariesCommand

a3fe9c5

Refactor: AbstractPaginatedCommand

6cecbf4

This comment has been minimized.

Sign in to view

Added: invalid pagination params test cases to testSummaryDatasetVers…

98a03c3

…ionsDifferencesAPI IT

This comment has been minimized.

Sign in to view

GPortas added 2 commits October 13, 2025 13:31

Added: docs for pagination in versionDifferences Files API endpoint

3546279

Fixed: typo in docs for versions/compareSummary

ddfefd4

This comment has been minimized.

Sign in to view

Added: release notes for #11855

80eae81

This comment has been minimized.

Sign in to view

GPortas added 2 commits October 13, 2025 16:44

Refactor: FileVersionDifferenceJsonPrinter with unit tests

5217e68

Fixed: typo in javadoc

0f54454

This comment has been minimized.

Sign in to view

GPortas added 2 commits October 14, 2025 20:03

Fixed: DataFileServiceBean.findFileMetadataHistory behavior

0b21a9a

Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…

b2f20ac

…n-summaries-pagination

This comment has been minimized.

Sign in to view

GPortas added 2 commits October 14, 2025 23:12

Fixed: added missing contributor names to file metadata in GetFileVer…

4557600

…sionDifferencesCommand

Added: explanatory comment to GetFileVersionDifferencesCommand

ef42d87

Version summaries enhancements and pagination #11859

Are you sure you want to change the base?

Version summaries enhancements and pagination #11859

Conversation

GPortas commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GPortas commented Oct 1, 2025 •

edited

Loading