Skip to content

Unacceptable performance deleting datasets and/or datasetversions in the prod. database - needs to be improved.  #11828

@landreev

Description

@landreev

[edit: the very last paragraph appears to point in the right direction; see the next comment also]
Investigating this harvested performance issue in IQSS prod.:
Harvesting a new dataset is generally fast (some fraction of a sec./dataset);
Re-harvesting that same dataset the performance is much (10s of times) worse.
This must somehow be a function of the overall database size; since it's not observable w/ small test databases.
It is reproducible on the perf. system w/ a clone of the prod. database (even slower there).
It does not appear to be a function of the number of datasets in the collection (observable in a collection with very few harvested datasets).

I made a PR a year ago specifically to make re-harvesting cheaper. It is absolutely possible that something I did there backfired and made things worse.
One extra confusing detail though is that it also takes forever to delete harvested datasets when deleting a client (appears to take exactly as long to delete as it does to re-harvest/update these datasets). Harvested datasets are removed via a cascade in that scenario; I did not touch that part in 10836.

I don't know yet for sure if the issue is in the database update or delete, or in solr reindexing. But, FWIW, when deleting a client, the dataset cards in the collection disappear almost instantly, and then it takes forever for the dataset objects to disappear from the db.

I'm looking into this since this is an effective blocker for a few large remote collections that need to be re-harvested.

(to be clear, I have no solid evidence yet of this being unique to harvested objects; it's just that with harvesting you often have real life cases where you need to update large numbers - hundreds or even thousands - of datasets all at once)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions