-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
ATM we have two cron jobs tools/backups2datalad-update-cron (ran often) and tools/backups2datalad-update-cron-108 (ran long) since 108 contained zarrs and their backup is much more involved (see e.g. #363 ) than of regular files. But now other dandisets also start to contain zarrs. We need to figure out a workflow to perform updates in such a fashion that we do not need some custom separation across dandisets.
I think overall we should start using some proper job system to orchestrate updates. May be even a full blown celery with that flower to monitor the status? Then workflow could be
- given a dandiset with changed time stamp, and no ongoing already job to update it, schedule an
update_dandisetjob which- for all zarr assets check if they exist, not being uploaded, and up-to-date (based on date).
- if any missing - schedule a job to have zarr created/updated
- if any out of date - schedule a job to have zarr updated
- we might need a "registry" of jobs since can't query celery for ongoing/planned jobs so we skip dandiset if any job is still running
- in any of above cases, skip updating the dandiset in this round
- if no zarrs - or all zarrs found up to date [*], proceed with update of the dandiset as we do now
- for all zarr assets check if they exist, not being uploaded, and up-to-date (based on date).
[*] alert -- race condition, unless we collect specific commits for each zarr so we update them to those and would be fine even if zarr is being modified
Metadata
Metadata
Assignees
Labels
No labels