Description
The ultimate goal of mirror-clone is provide an easy-to-use abstraction layer for developers who want to clone a software repo to their own local registry.
Developers will need to implement two interface, SourceFS
and TargetFS
, in order to clone a registry.
SourceFS
SourceFS
generally refers to the source software registry. For example, crates.io, opam, conda, etc. It provides the following functionalities:
snapshot
provides a file list of current software registry.- For OPAM, taking a
snapshot
involves downloadrepo
andindex.tar.gz
, and parse the information. - For conda, this involves download
repodata.json
and generate file list. - For crates.io, this involves scanning the crates.io-index repo and generate file list.
- For OPAM, taking a
entry
provides the way to download a file from source filesystem.- For most of the mirroring tasks, this is to find corresponding URL and checksum to a file.
- Also, index file should be included. For example,
index.tar.gz
.
TargetFS
TargetFS
generally refers to a local filesystem. It could also be an object storage, or a key-value database.
TargetFS
should be able to:
list
filesread
filewrite
file- get
metadata
of a file
Mirror-Clone
mirror-clone provides utilities for mirroring a repo.
tmpfs
tmpfs stores file temporarily. When taking a snapshot, source filesystem may download some index file. They could be saved to tmpfs, and be served directly when entry
is being called.
downloader
downloader helps download a file from a given URL.
transferrer
Transferrer transfers a file from source filesystem to target filesystem. It will automatically retry failed requests.
comparator
Given an entry on source filesystem and target filesystem, a comparator decides whether a file requires re-transferring.
buffer layer
Buffer layer stands between transferrer and target filesystem.
Transaction Buffer
provides a transaction-commit interface. It's normal that a file could not be downloaded successfully because of network issues. Buffer layer commits
a file to target filesystem only when a file is successfully downloaded (or wait until all files have been downloaded)
Fuse Buffer
ensures that a file is never downloaded twice by fusing
it. It will also record file metadata in a single cache file to speed up listing all files in target filesystem.