-
Notifications
You must be signed in to change notification settings - Fork 530
Version summaries enhancements and pagination #11859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
GPortas
wants to merge
37
commits into
develop
Choose a base branch
from
11855-version-summaries-pagination
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+1,921
−273
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This comment has been minimized.
This comment has been minimized.
…eBean.findVersions
…n-summaries-pagination
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
943da66
to
b3c4fbd
Compare
This comment has been minimized.
This comment has been minimized.
3 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
…onDifferencesCommand
…tFileVersionDifferencesCommand refactored
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
…n-summaries-pagination
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…leVersionDifferencesCommandTest
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
…ionsDifferencesAPI IT
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
📦 Pushed preview images as
🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This pull request introduces pagination mechanisms for the dataset and datafile version summary/difference endpoints.
While the initial scope was limited to adding pagination, a preliminary investigation revealed significant underlying issues that would have made a direct implementation inefficient and unsustainable. The core problems identified were:
High Coupling and Poor Separation of Concerns: The existing code lacked clear architectural layering and encapsulation for these use cases, leading to tightly coupled components that are difficult to maintain and extend.
Severe Performance Bottlenecks: The implementation relied on fetching bulk data from the database and then post-processing it in Java using multiple nested loops. This approach caused significant performance degradation, especially for datasets or files with a large number of versions.
Low Test Coverage: The lack of a comprehensive test suite made it risky to extend/alter the existing functionality without introducing regressions.
Changes Made
Given the issues discovered, the decision was made to perform a comprehensive, end-to-end refactoring of these features. This ensures that the new pagination functionality is built upon a robust, performant, and maintainable foundation.
The key changes include:
Architectural Realignment: The entire workflow, from the API endpoint to the data access layer, has been refactored to align with the established Dataverse architecture using Commands and Services. This improves modularity and clarifies responsibilities within the code.
Performance Optimization with JPA Criteria: All data processing has been pushed down to the database layer. In-memory processing with loops has been replaced with specific, performant JPA Criteria-based queries. This dramatically improves response times for entities with extensive version histories.
Improved Test Coverage: New unit tests have been introduced to cover the refactored code, addressing critical logic that was previously untested. This ensures the stability of the new implementation and simplifies future development.
The intermittent 500 errors that were occurring and reported in #11561
have also been resolved. These errors were caused by a null pointer exception resulting from null datafiles. Below, the error trace is shown, followed by a screenshot of the error being reproduced, and then another screenshot taken after deploying this PR branch.
Reproduced error:


Fixed:
Which issue(s) this PR closes:
Sharing some thoughts...:
Let's keep our focus on making sure our codebase stays healthy and easy to work with.
Sticking to our layered architecture is key. It's what makes it way easier for everyone to jump in and make changes without breaking things.
Let's also be serious about our tests. Good unit tests prove the little pieces work, and API tests make sure the whole thing hangs together. Both are necessary.
Finally, let's all try to follow the 'campsite rule' with tech debt: leave the code a little cleaner than you found it. If you spot something you can quickly improve, do it. It's a small effort that saves us from huge headaches down the road.
Suggestions on how to test this:
Performance enhancements could be tested by running this branch on an installation with a dataset or file with a large number of versions, and calling the endpoint below without pagination, and compare the response time with develop.
You can control pagination of the results using the following optional query parameters.
limit
: The maximum number of version differences to return.offset
: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.For example, to get the second page of results, with 2 items per page, you would use
limit=2
andoffset=2
(skipping the first two results).For datasets:
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z&limit=2&offset=2"
For files:
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences?limit=2&offset=2"
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No
Is there a release notes update needed for this change?:
Yes, attached.