There is an index in opensearch with the mapping between the the files and the records; opendata-prod-v0.2-records-recid_mapping
It would be nice to extend that information with the last access time per file. And we should cross check this information with the eos dump that we get on /eos/workspace/c/cernod/dumps/opendata/latest
We have requested to add the access date on that dump, which would make the task easier.
In the meantime, it would be nice to:
- Setup a daily celery task that would go through the files in the eos dump.
- For the files in that dump (ignoring anything under /eos/opendada//upload), identify the ones that are not in the index.
- For those ones, check if they are new files that should be added to that index (if they exist in the database). If not, create a new index with the name of all these files (opendata-dev-v0.2-dark-files)
- Now that the access date is added to the dump: for files that are in the index, update the index with the last accessed time
There is an index in opensearch with the mapping between the the files and the records; opendata-prod-v0.2-records-recid_mapping
It would be nice to extend that information with the last access time per file. And we should cross check this information with the eos dump that we get on /eos/workspace/c/cernod/dumps/opendata/latest
We have requested to add the access date on that dump, which would make the task easier.
In the meantime, it would be nice to: