Open
Description
Bug Report
❗Temporary Fix at the bottom ❗
Description
dvc.api.read
will fail to read files if the repo
argument is Windows AND the script execution path is not on the Project Dir.
An error would be raised:
dvc.exceptions.PathMissingError: The path `<PARENT_DIR>/<FILE>` does not exist in the target repository `<PARENT_DIR>/<FILE>` neither as a DVC output nor as a Git-tracked file.
Reproduce
- I followed the tutorial: https://dvc.org/doc/start/data-management/data-versioning?tab=Windows-Cmd-
dvc init
dvc get https://github.com/iterative/dataset-registry \
get-started/data.xml -o data/data.xml
dvc add data/data.xml
- Add Python script
src/test.py
it must be in a folder, it works fine if test.py is in the proj path
The Windows filesys should be like this now in
dvctest/
.dvc/
data/
.gitignore
data.xml
data.xml.dvc
src/
test.py
venv/
.dvcignore
test.py
from pathlib import Path
import dvc.api
PROJ_DIR= Path(__file__).parents[1] # This refers to the proj dir
with dvc.api.open("data/data.xml",
repo=PROJ_DIR.as_posix(),
mode="rb") as f:
print(f.read())
test.py
(also causes this issue)
from pathlib import Path
import dvc.api
PROJ_DIR= Path(__file__).parents[1] # This refers to the proj dir
dvc.api.read("data/data.xml",
repo=PROJ_DIR.as_posix(),
mode="rb")
Both of them raise the error:
dvc.exceptions.PathMissingError: The path `src/data.xml' does not exist in the target repository 'src/data.xml' neither as a DVC output nor as a Git-tracked file.
Expected
It should work regardless of where I put the file
dvctest/
.dvc/
data/
.gitignore
data.xml
data.xml.dvc
test.py <---- Putting it here works
venv/
.dvcignore
In
test.py
,Path(__file__).parents[1]
should now bePath(__file__).parents[0]
or Path(file).parent`.
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 3.30.3 (pip)
-------------------------
Platform: Python 3.11.6 on Windows-10-10.0.22621-SP0
Subprojects:
dvc_data = 2.22.3
dvc_objects = 1.3.0
dvc_render = 0.6.0
dvc_task = 0.3.0
scmrepo = 1.5.0
Supports:
http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3)
Config:
Global: C:\Users\JOHN.CHANGRQ\AppData\Local\iterative\dvc
System: C:\ProgramData\iterative\dvc
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\50d38bc72608938a5da29ea637ac44ee
❗ Temporary Fix
For the repo
argument, prepend a file://
.
dvc.api.read(
path="path/to/file.txt",
repo="file://C:/.../my-proj", # previously C:/.../my-proj
)
Im not sure why this doesn't fix the above minimal reproducible example, but it worked for my project