Skip to content

external workspaces #3920

Open
Open
@efiop

Description

@efiop

We currently support a so-called "external outputs" scenario, that is based on creating separate external cache locations for each type of the output (e.g. s3, ssh etc). This scenario is unpolished and and even straight broken in some scenarios and is also constantly being misused with an intention of importing files from remote locations to local cache/workspace or remote. People often don't realise that this effectively extends your workspace outside of your local repo and needs proper isolation, in order to not run into conflict with your collegue running dvc checkout while you are working on the file/dir on external workspace.

This makes me think that we need to introduce proper terminology and abstraction to clearly state what this is all about and make this powerful feature usable. The solution is to introduce a concept of "workspace"s. It could possibly look something like this:

*** DRAFT ***

  1. Define the external workspace you want to attach:
dvc workspace add myssh ssh://example.com/home/efiop

Now, unless explicitly configured otherwise, dvc will assume that you want to use ssh://example.com/home/efiop/.dvc/cache as a default cache for artifacts in that workspace. Similar to your local repo.

  1. Use your workspace:
dvc add ssh://example.com/home/efiop/data
dvc run -o ssh://example.com/home/efiop/model.pkl ...

or with a special workspace-notation (similar to a so-called remote-notation that we currently have: remote://myremote/path)

dvc add ws://myssh/data
dvc run -o ws://myssh/model.pkl

This notation is nice because it allows you to redefine your workspaces (e.g. for each coworker to use his own home directory on the server) pretty easily (plus the config options can be set once in the config section for that workspace).

Current cache.local/ssh/s3/gs/etc sections will get removed, because it is wrong that we operate based on cache schema and not the workspace we are working on.

CC @PeterFogh , would appreciate your thoughts on this 🙂

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionrequires active participation to reach a conclusionfeature requestRequesting a new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions