Skip to content

dvc import compatible with GitHub App Token #8068

Open
@mikolajpabiszczak

Description

@mikolajpabiszczak

I haven't seen any proposal of this kind in the issues and - based on my use case - it could solve a number of problems.

Scenario:

  • you have a Data Registry (as git repo + cloud storage, e.g., AWS S3);
  • you have a Experiment Repository in which you have the code that runs experiments (and experiments use data from Data Registry);
  • you wrap this thing with CML and you use GitHub App with Access Tokens

Problem:

  • suppose you use dvc import to obtain some_data from the Data Registry (call it: github.com/username/DataRegistry)
  • it will be recorded in dvc.lock as
     deps:
       - path: some_data
         repo:
           url: [email protected]:username/DataRegistry.git
           rev_lock: af6a1feb542dc05b4d3e9c80deb50e6596876e5f
    
  • now the problem occurs: CML runs this pipeline on instance and when it tries to get the data from Data Registry remote it fails, as it cannot clone the Data Registry repository (in order to do so, it would need to use generated app token).

Proposition:

  • it would be nice if dvc import (or actually dvc pull ?) checked for DATA_REGISTRY_TOKEN env variable and updated the url "on the fly" when pulling data from the remote.

Disclaimer: I was intending on writing this some months ago, at the time the desired behaviour was not in place. I did a quick look, but did not find any mention of it.

Thanks for your effort and please ask any questions in case you need clarification!

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestRequesting a new featuregitRelated to git and git backends

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions