Skip to content

repro pipelines/train/dvc.yaml --downstream: Also looks for other dvc pipelines in the repository #9381

Open
@nono1515

Description

@nono1515

Bug Report

Description

When runnning dvc repro path/to/dvc.yaml with the --downstream argument, DVC will try to look for all dvc.yaml files in the workspace and also execute stages in the latter if they have dependancies downstream. This is inconsistent with dvc repro path/to/dvc.yaml which only executes stages in the given dvc pipeline.

Reproduce

Let's say you have two pipelines with the following stages, outputs and dependancies

  • pipelines/train/dvc.yaml
    • A, outs: data/A
    • B, deps: data/A, outs: data/B
  • pipelines/test/dvc.yaml
    • C, deps: data/B, outs: data/C

such that

$ dvc dag pipelines/train/dvc.yaml    
+----------------------------+ 
| pipelines/train/dvc.yaml:A | 
+----------------------------+ 
               *               
               *               
               *               
+----------------------------+ 
| pipelines/train/dvc.yaml:B | 
+----------------------------+ 

and

$ dvc dag pipelines/test/dvc.yaml 
+----------------------------+ 
| pipelines/train/dvc.yaml:A | 
+----------------------------+ 
               *               
               *               
               *               
+----------------------------+ 
| pipelines/train/dvc.yaml:B | 
+----------------------------+ 
               *               
               *               
               *               
+---------------------------+  
| pipelines/test/dvc.yaml:C |  
+---------------------------+  

Running dvc repro pipelines/train/dvc.yaml executes A and B.
Running dvc repro pipelines/train/dvc.yaml --downstream executes A, B and C.

Expected

dvc repro pipelines/train/dvc.yaml --downstream should only run A and B, as C is not in the given pipeline, and to be consist with dvc repro pipelines/train/dvc.yaml

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.55.0 (pip)
-------------------------
Platform: Python 3.8.16 on Linux-5.15.108-1-MANJARO-x86_64-with-glibc2.34
Subprojects:
        dvc_data = 0.47.2
        dvc_objects = 0.21.2
        dvc_render = 0.3.1
        dvc_task = 0.2.1
        scmrepo = 1.0.2
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        ssh (sshfs = 2023.4.1)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/d66f9faf43af22dff31dd5850172cab3

Additional Information (if any):

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: pipelinesRelated to the pipelines featurefeature requestRequesting a new featurep3-nice-to-haveIt should be done this or next sprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions