Skip to content

dvc status --json can output non-json #10242

@gregstarr

Description

@gregstarr

Bug Report

Description

When there are large files to hash which are not cached, dvc status --json will still print out the message, which makes the output not valid json. I believe the use case of dvc status --json is to be able to pipe the output to a file and easily read it with another program, so extra messages make this inconvenient.

I accidentally erased the output I had but I think this is the message that is printed out: https://github.com/iterative/dvc-data/blob/300a3e072e5baba50f7ac5f91240891c0e30d030/src/dvc_data/hashfile/hash.py#L174

Reproduce

  1. large data file stage dependency
  2. dvc status --json for the first time

Expected

dvc status --json only outputs valid json

Environment information

Output of dvc doctor:

DVC version: 3.33.4 (choco)
---------------------------
Platform: Python 3.11.6 on Windows-10-10.0.19045-SP0
Subprojects:
        dvc_data = 2.24.0
        dvc_objects = 2.0.1
        dvc_render = 1.0.0
        dvc_task = 0.3.0
        scmrepo = 1.6.0
Supports:
        azure (adlfs = 2023.12.0, knack = 0.11.0, azure-identity = 1.15.0),
        gdrive (pydrive2 = 1.19.0),
        gs (gcsfs = 2023.12.2.post1),
        http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        oss (ossfs = 2023.12.0),
        s3 (s3fs = 2023.12.2, boto3 = 1.33.13),
        ssh (sshfs = 2023.10.0)
Config:
        Global: C:\Users\starrgw1\AppData\Local\iterative\dvc
        System: C:\ProgramData\iterative\dvc

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: cliRelated to the CLIbugDid we break something?p3-nice-to-haveIt should be done this or next sprintuiuser interface / interaction

    Type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions