QDR-DataCite Scaling #11832

qqmyers · 2025-09-19T14:59:03Z

What this PR does / why we need it: This PR improves scaling of requests to DataCite in two ways:

adds retries with a delay if/when DataCite responds with error codes that indicate requests are being throttled (429) or their server is temporarily not responding (503, 504)
optionally checks with DataCite to see if updates are needed before sending updates

The former is fairly straight forward - rather than failing immediately, Dataverse will wait/temporarily slow requests to see if DataCite recovers/Dataverse can drop below the rate limit. If things recover, Dataverse's operations will succeed. If not, there could be a delay of ~ 1minute before a final error occurs and the operation fails.

The latter is perhaps more controversial (there was discussion several years ago about whether this is useful): instead of always sending an update, causing DataCite to write info, this optional change causes Dataverse to first query DataCite (a read) and only send an update if the local info is different than what DataCite has. In cases such as file DOIs where changes are infrequent, this results in many reads and few writes instead of many writes and DataCite (and growing records as they track all writes of new metadata) which appears to be faster. It may be generally useful, but installations not using file DOIs may not want to try it.

Which issue(s) this PR closes:

Closes #

Special notes for your reviewer: QDR had trouble publishing a dataset with >10K files before this change and succeeded after.

Suggestions on how to test this: Minimally regression test (w/ and w/o flag).

Could also attempt to create/publish a dataset with file DOIs and many files using the DataCite test server and see if the changes increase the success rate/largest size that succeeds and/or improves performance (i.e. with the flag on.) I'm not sure this is worth it given the testing/deployment at QDR.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

coveralls · 2025-09-25T18:33:26Z

coverage: 23.532% (-0.007%) from 23.539%
when pulling 96acb2b on QualitativeDataRepository:QDR-DCiteScaling
into f79a02b on IQSS:develop.

qqmyers added 4 commits September 19, 2025 10:16

add retries to DataCite calls for 429, 503, 504

1deda57

optionally update DataCite DOI when needed during publish

e16b29c

catch null url for draft DOI

6c086e3

docs, release note

eda19fe

qqmyers added this to IQSS Dataverse Project Sep 19, 2025

qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label Sep 19, 2025

qqmyers added this to the 6.9 milestone Sep 19, 2025

Merge remote-tracking branch 'IQSS/develop' into QDR-DCiteScaling

96acb2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QDR-DataCite Scaling #11832

QDR-DataCite Scaling #11832

Uh oh!

qqmyers commented Sep 19, 2025

Uh oh!

coveralls commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

QDR-DataCite Scaling #11832

Are you sure you want to change the base?

QDR-DataCite Scaling #11832

Uh oh!

Conversation

qqmyers commented Sep 19, 2025

Uh oh!

coveralls commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants