140-stage DVC pipeline hard to work with

I have been advised by @daavoo to create this issue for better tracking of my challenges regarding the usage of DVC. I've put it under `Feature Request` as what we are looking for may not be possible for current DVC.

Here is the [original post](https://discuss.dvc.org/t/140-stage-dvc-pipeline-getting-hard-to-work-with/1684/2) I created in the DVC forum detailing our needs. 

Basically, our current pipeline is becoming quite big with 30 models and 140 dvc stage instances. We have 3 different stages: `create_dataset`, `train_model`, and `compute_metrics`. Consequently, we use `foreach` definitions of the 3 stages inside the `dvc.yaml` file to reduce the code duplication. Still, the `params.yaml` file is 1300 lines long, which is hard to work with. 

Also, all the stage instances have the name "stage_name@number" (e.g., "train_model@0). The names do not hold useful information making the use of selective `dvc repro -s` hard to work with (which we use a lot). For instance, a common command we use would be `dvc repro -s create_dataset@0 create_dataset@1 create_dataset@2 train_model@0 compute_metrics@0` to repro all the stage instances of a given model. To know what stage instances belong to the given model we want to repro, we need to look at the `dvc.lock` which is super tedious (3k-line long).

Ideally we are looking for a way to:
1. split the `params.yaml` into smaller ones, each belonging to a given model
2. have better stage instance namings, to better tell them apart

I think point 2. is doable by declaring the stage instances as follows in the `params.yaml` file:
```
create_dataset_list:
  model_1_trainset:
    script: create_dataset.py
    dataset_yaml: trainset.yaml
    folder_images: trainset_images
    params: trainset_params.py
    output: trainset.h5
```

However, as for point 1., I don't have any idea as importing yaml files into other ones is not possible AFAIK.

I have attached a minimal example to better show how our project is organized around DVC.

[minimal_dvc.zip](https://github.com/iterative/dvc/files/12251465/minimal_dvc.zip)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

140-stage DVC pipeline hard to work with #9795

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

140-stage DVC pipeline hard to work with #9795

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions