The ultimate goal of mirror-clone is provide an easy-to-use abstraction layer for developers who want to clone a software repo to their own local registry.
Developers will need to implement two interface, SourceFS and TargetFS, in order to clone a registry.
SourceFS
SourceFS generally refers to the source software registry. For example, crates.io, opam, conda, etc. It provides the following functionalities:
snapshot provides a file list of current software registry.
- For OPAM, taking a
snapshot involves download repo and index.tar.gz, and parse the information.
- For conda, this involves download
repodata.json and generate file list.
- For crates.io, this involves scanning the crates.io-index repo and generate file list.
entry provides the way to download a file from source filesystem.
- For most of the mirroring tasks, this is to find corresponding URL and checksum to a file.
- Also, index file should be included. For example,
index.tar.gz.
TargetFS
TargetFS generally refers to a local filesystem. It could also be an object storage, or a key-value database.
TargetFS should be able to:
list files
read file
write file
- get
metadata of a file
Mirror-Clone
mirror-clone provides utilities for mirroring a repo.
tmpfs
tmpfs stores file temporarily. When taking a snapshot, source filesystem may download some index file. They could be saved to tmpfs, and be served directly when entry is being called.
downloader
downloader helps download a file from a given URL.
transferrer
Transferrer transfers a file from source filesystem to target filesystem. It will automatically retry failed requests.
comparator
Given an entry on source filesystem and target filesystem, a comparator decides whether a file requires re-transferring.
buffer layer
Buffer layer stands between transferrer and target filesystem.
Transaction Buffer provides a transaction-commit interface. It's normal that a file could not be downloaded successfully because of network issues. Buffer layer commits a file to target filesystem only when a file is successfully downloaded (or wait until all files have been downloaded)
Fuse Buffer ensures that a file is never downloaded twice by fusing it. It will also record file metadata in a single cache file to speed up listing all files in target filesystem.
The ultimate goal of mirror-clone is provide an easy-to-use abstraction layer for developers who want to clone a software repo to their own local registry.
Developers will need to implement two interface,
SourceFSandTargetFS, in order to clone a registry.SourceFS
SourceFSgenerally refers to the source software registry. For example, crates.io, opam, conda, etc. It provides the following functionalities:snapshotprovides a file list of current software registry.snapshotinvolves downloadrepoandindex.tar.gz, and parse the information.repodata.jsonand generate file list.entryprovides the way to download a file from source filesystem.index.tar.gz.TargetFS
TargetFSgenerally refers to a local filesystem. It could also be an object storage, or a key-value database.TargetFSshould be able to:listfilesreadfilewritefilemetadataof a fileMirror-Clone
mirror-clone provides utilities for mirroring a repo.
tmpfs
tmpfs stores file temporarily. When taking a snapshot, source filesystem may download some index file. They could be saved to tmpfs, and be served directly when
entryis being called.downloader
downloader helps download a file from a given URL.
transferrer
Transferrer transfers a file from source filesystem to target filesystem. It will automatically retry failed requests.
comparator
Given an entry on source filesystem and target filesystem, a comparator decides whether a file requires re-transferring.
buffer layer
Buffer layer stands between transferrer and target filesystem.
Transaction Bufferprovides a transaction-commit interface. It's normal that a file could not be downloaded successfully because of network issues. Buffer layercommitsa file to target filesystem only when a file is successfully downloaded (or wait until all files have been downloaded)Fuse Bufferensures that a file is never downloaded twice byfusingit. It will also record file metadata in a single cache file to speed up listing all files in target filesystem.