Skip to content

mirror-clone roadmap #14

Open
Open
@skyzh

Description

@skyzh

The ultimate goal of mirror-clone is provide an easy-to-use abstraction layer for developers who want to clone a software repo to their own local registry.

Developers will need to implement two interface, SourceFS and TargetFS, in order to clone a registry.

SourceFS

SourceFS generally refers to the source software registry. For example, crates.io, opam, conda, etc. It provides the following functionalities:

  • snapshot provides a file list of current software registry.
    • For OPAM, taking a snapshot involves download repo and index.tar.gz, and parse the information.
    • For conda, this involves download repodata.json and generate file list.
    • For crates.io, this involves scanning the crates.io-index repo and generate file list.
  • entry provides the way to download a file from source filesystem.
    • For most of the mirroring tasks, this is to find corresponding URL and checksum to a file.
    • Also, index file should be included. For example, index.tar.gz.

TargetFS

TargetFS generally refers to a local filesystem. It could also be an object storage, or a key-value database.

TargetFS should be able to:

  • list files
  • read file
  • write file
  • get metadata of a file

Mirror-Clone

mirror-clone provides utilities for mirroring a repo.

tmpfs

tmpfs stores file temporarily. When taking a snapshot, source filesystem may download some index file. They could be saved to tmpfs, and be served directly when entry is being called.

downloader

downloader helps download a file from a given URL.

transferrer

Transferrer transfers a file from source filesystem to target filesystem. It will automatically retry failed requests.

comparator

Given an entry on source filesystem and target filesystem, a comparator decides whether a file requires re-transferring.

buffer layer

Buffer layer stands between transferrer and target filesystem.

Transaction Buffer provides a transaction-commit interface. It's normal that a file could not be downloaded successfully because of network issues. Buffer layer commits a file to target filesystem only when a file is successfully downloaded (or wait until all files have been downloaded)

Fuse Buffer ensures that a file is never downloaded twice by fusing it. It will also record file metadata in a single cache file to speed up listing all files in target filesystem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions