Skip to content

dvc.api.read: Fails to parse Windows-path for repo argument if script is not on Proj Dir #10127

Open
@Eve-ning

Description

@Eve-ning

Bug Report

❗Temporary Fix at the bottom ❗

Description

dvc.api.read will fail to read files if the repo argument is Windows AND the script execution path is not on the Project Dir.

An error would be raised:

dvc.exceptions.PathMissingError: The path `<PARENT_DIR>/<FILE>` does not exist in the target repository `<PARENT_DIR>/<FILE>` neither as a DVC output nor as a Git-tracked file.

Reproduce

  1. I followed the tutorial: https://dvc.org/doc/start/data-management/data-versioning?tab=Windows-Cmd-
dvc init
dvc get https://github.com/iterative/dataset-registry \
          get-started/data.xml -o data/data.xml
dvc add data/data.xml
  1. Add Python script src/test.py

it must be in a folder, it works fine if test.py is in the proj path

The Windows filesys should be like this now in

dvctest/
  .dvc/
  data/
    .gitignore 
    data.xml 
    data.xml.dvc
  src/
    test.py 
  venv/  
  .dvcignore

test.py

from pathlib import Path

import dvc.api

PROJ_DIR= Path(__file__).parents[1] # This refers to the proj dir
with dvc.api.open("data/data.xml",
             repo=PROJ_DIR.as_posix(),
             mode="rb") as f:
    print(f.read())

test.py (also causes this issue)

from pathlib import Path

import dvc.api

PROJ_DIR= Path(__file__).parents[1] # This refers to the proj dir
dvc.api.read("data/data.xml",
             repo=PROJ_DIR.as_posix(),
             mode="rb")

Both of them raise the error:

dvc.exceptions.PathMissingError: The path `src/data.xml' does not exist in the target repository 'src/data.xml' neither as a DVC output nor as a Git-tracked file.

Expected

It should work regardless of where I put the file

dvctest/
  .dvc/
  data/
    .gitignore 
    data.xml 
    data.xml.dvc
  test.py  <---- Putting it here works
  venv/  
  .dvcignore

In test.py, Path(__file__).parents[1] should now be Path(__file__).parents[0] or Path(file).parent`.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.30.3 (pip)
-------------------------
Platform: Python 3.11.6 on Windows-10-10.0.22621-SP0
Subprojects:
        dvc_data = 2.22.3
        dvc_objects = 1.3.0
        dvc_render = 0.6.0
        dvc_task = 0.3.0
        scmrepo = 1.5.0
Supports:
        http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3)
Config:
        Global: C:\Users\JOHN.CHANGRQ\AppData\Local\iterative\dvc
        System: C:\ProgramData\iterative\dvc
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\50d38bc72608938a5da29ea637ac44ee

❗ Temporary Fix

For the repo argument, prepend a file://.

dvc.api.read(
    path="path/to/file.txt",
    repo="file://C:/.../my-proj", # previously C:/.../my-proj
)

Im not sure why this doesn't fix the above minimal reproducible example, but it worked for my project

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: apiRelated to the dvc.apiP: windowsRelated to the Platform: WindowsbugDid we break something?p3-nice-to-haveIt should be done this or next sprint

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions