Open
Description
I haven't seen any proposal of this kind in the issues and - based on my use case - it could solve a number of problems.
Scenario:
- you have a Data Registry (as git repo + cloud storage, e.g., AWS S3);
- you have a Experiment Repository in which you have the code that runs experiments (and experiments use data from Data Registry);
- you wrap this thing with CML and you use GitHub App with Access Tokens
Problem:
- suppose you use
dvc import
to obtainsome_data
from the Data Registry (call it:github.com/username/DataRegistry
) - it will be recorded in
dvc.lock
asdeps: - path: some_data repo: url: [email protected]:username/DataRegistry.git rev_lock: af6a1feb542dc05b4d3e9c80deb50e6596876e5f
- now the problem occurs: CML runs this pipeline on instance and when it tries to get the data from Data Registry remote it fails, as it cannot clone the Data Registry repository (in order to do so, it would need to use generated app token).
Proposition:
- it would be nice if
dvc import
(or actuallydvc pull
?) checked forDATA_REGISTRY_TOKEN
env variable and updated the url "on the fly" when pulling data from the remote.
Disclaimer: I was intending on writing this some months ago, at the time the desired behaviour was not in place. I did a quick look, but did not find any mention of it.
Thanks for your effort and please ask any questions in case you need clarification!