Skip to content

dvc status: shows that params have changed when they haven't #9518

Open
@gonzalojaimovitch

Description

@gonzalojaimovitch

Bug Report

dvc status shows that params have changed when they haven't

Description

After reproducing a pipeline successfully, where each stage is using the params field and running "dvc status", I get the same message saying that I have a change in my deps with the tag "new" pointing to every parameter file on each stage. When running "dvc repro" again when no changes have happened, it doesn't run the pipeline again, but with a message on each stage that it is using the cached version. When a collaborator pulls the same commit, the pipeline runs one more time (assuming the collaborator doesn't have any cached data). I guess this should not be the expected behaviour. For what I know so far, it should return a message in the style "pipeline up to date", and no execution happening.

Reproduce

  1. dvc repro
  2. dvc status

Expected

I expect dvc to print a message informing that everything is up to date. Furthermore, I expect this to happen when a collaborator clones the project, pulls the data from remotes and executes dvc status

Environment information

Output of dvc status:

$ dvc status
data_extraction_tags:
        changed deps:
                new:                config/Data_Extraction.yaml
data_extraction_image:
        changed deps:
                new:                config/Data_Extraction.yaml
data_merging:
        changed deps:
                new:                config/Data_Merging.yaml

Additional Information (if any):

This would be the dvc.yaml content I am using in the project:

stages: 
    data_extraction_tags:
        cmd: python -m src.data_extraction.data_extraction --env=./config/.env --config=./config/Data_Extraction.yaml --query_name_param=query_tags_name --query_param=query_tags --query_scheme_param=scheme_tags
        deps:
            - ./src/data_extraction/data_extraction.py
        params:
            - ./config/Data_Extraction.yaml:
                - pathData
                - query_tags_name
                - query_tags
                - scheme_tags
        vars:
            - ./config/Data_Extraction.yaml:query_tags_name
            - ./config/Data_Extraction.yaml:pathData
        outs:
            - ${pathData}/${query_tags_name}.parquet.gzip
            
    data_extraction_image:
        cmd: python -m src.data_extraction.data_extraction --env=./config/.env --config=./config/Data_Extraction.yaml --query_name_param=query_images_name --query_param=query_images --query_scheme_param=scheme_images
        deps:
            - ./src/data_extraction/data_extraction.py
        params:
            - ./config/Data_Extraction.yaml:
                - pathData
                - query_images_name
                - query_images
                - scheme_images
        vars:
            - ./config/Data_Extraction.yaml:query_images_name
            - ./config/Data_Extraction.yaml:pathData
        outs:
            - ${pathData}/${query_images_name}.parquet.gzip

    data_merging:
        cmd: python -m src.data_preparation.data_merging --config=./config/Data_Merging.yaml
        deps:
            - ./src/data_preparation/data_merging.py
            - ${pathDataRaw}
        params:
            - ./config/Data_Merging.yaml:
        vars:
            - ./config/Data_Merging.yaml:pathDataRaw
            - ./config/Data_Merging.yaml:pathDataMerged
        outs:
            - ${pathDataMerged}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugDid we break something?p3-nice-to-haveIt should be done this or next sprint

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions