Skip to content

Consider how to help with archiving NDJSON in locations that don't support symlinks #17

@mikix

Description

@mikix

As an example, Amazon S3 does not support symlinks. And it's a likely place to archive exports (BCH puts ours there).

So the fact that export helpfully makes symlinks for you is maybe... not so relevant in that context.

Annoyances for someone archiving in S3:

  • By default aws s3 will follow symlinks which would create needless wasted space. You have to provide --no-follow-symlinks to turn that off.
  • When restoring a directory from S3 and wanting to work on it locally with export, SMART Fetch relies on the symlinks being there for some stuff like getting the patient cohort for a crawl. So you'd have to re-crawl the patients anew or recreate the symlinks (which can be done since the folders are all in order!)

So maybe... I'm thinking we could help both by offering an s3 subcommand (or archive s3 subcommand, or just archive and have it auto-detect s3 from the URL) that does the following:

  • Offers an upload mode to sync local files up to the remote, skipping symlinks
  • Offers a download mode that syncs remote files down locally, recreating symlinks

Or... a lower effort, more generic approach that just recreates the symlinks on a local folder. So it doesn't deal with pushing or pulling data - and thus doesn't help with needing --no-follow-symlinks, but it has a clearer/smaller scope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions