Skip to content

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Sep 19, 2025

What this PR does / why we need it: This PR improves scaling of requests to DataCite in two ways:

  • adds retries with a delay if/when DataCite responds with error codes that indicate requests are being throttled (429) or their server is temporarily not responding (503, 504)
  • optionally checks with DataCite to see if updates are needed before sending updates

The former is fairly straight forward - rather than failing immediately, Dataverse will wait/temporarily slow requests to see if DataCite recovers/Dataverse can drop below the rate limit. If things recover, Dataverse's operations will succeed. If not, there could be a delay of ~ 1minute before a final error occurs and the operation fails.

The latter is perhaps more controversial (there was discussion several years ago about whether this is useful): instead of always sending an update, causing DataCite to write info, this optional change causes Dataverse to first query DataCite (a read) and only send an update if the local info is different than what DataCite has. In cases such as file DOIs where changes are infrequent, this results in many reads and few writes instead of many writes and DataCite (and growing records as they track all writes of new metadata) which appears to be faster. It may be generally useful, but installations not using file DOIs may not want to try it.

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer: QDR had trouble publishing a dataset with >10K files before this change and succeeded after.

Suggestions on how to test this: Minimally regression test (w/ and w/o flag).

Could also attempt to create/publish a dataset with file DOIs and many files using the DataCite test server and see if the changes increase the success rate/largest size that succeeds and/or improves performance (i.e. with the flag on.) I'm not sure this is worth it given the testing/deployment at QDR.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@qqmyers qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label Sep 19, 2025
@qqmyers qqmyers added this to the 6.9 milestone Sep 19, 2025
@coveralls
Copy link

Coverage Status

coverage: 23.532% (-0.007%) from 23.539%
when pulling 96acb2b on QualitativeDataRepository:QDR-DCiteScaling
into f79a02b on IQSS:develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Size: 3 A percentage of a sprint. 2.1 hours.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants