Open
Description
Bug Report
DVC 3.56 Import ignore cache
Description
I have local DVC repo with json annotations added each one and large data storage with thousand image files added as full folder.
I use symlinks for cache.
But import in external storage doesn't create symlinks for images data storage. DVC download first, than link files.
Reproduce
Local data repo
Config
cache.type=symlink
core.autostage=true
Local storage dirs:
annotations/
master_annotation.json
train_annotation.json
test_annotations.json
data_storage/
image_0
image_1
...
image_N
In local storage comands
dvc add ./annotations/*
dvc add ./data_storage
Project repo
Config
cache.type=symlink
cache.dir=path/to/local/data/repo/.dvc/cache
core.autostage=true
commands:
dvc import path/to/local/data/repo data_storage
This command start downloading copies files from cache
dvc import path/to/local/data/repo data_storage --no-download
Check data_storage.dvc file and create it in project repo, but
dvc checkout data_storage.dvc
or
dvc checkout data_storage.dvc --relink
start downloading files again
Expected
I think DVC must create symlink for files without downloading originals
Environment information
Output of dvc doctor
in local data repo:
-------------------------
Platform: Python 3.12.7 on Linux-5.15.0-86-generic-x86_64-with-glibc2.31
Subprojects:
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.19.0),
gdrive (pydrive2 = 1.21.1),
gs (gcsfs = 2024.10.0),
hdfs (fsspec = 2024.10.0, pyarrow = 18.0.0),
http (aiohttp = 3.10.10, aiohttp-retry = 2.9.0),
https (aiohttp = 3.10.10, aiohttp-retry = 2.9.0),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.10.0, boto3 = 1.35.36),
ssh (sshfs = 2024.9.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.10.0)
Config:
Global: /home/user/.config/dvc
System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: nfs on ip-addr:/storage/
Caches: local
Remotes: None
Workspace directory: nfs on [ip-addr:/storage/](ip-addr:/storage/)
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/76de345055c7e5635fd954ee44e5d4e2
Output of dvc doctor
in project repo:
DVC version: 3.56.0 (deb)
-------------------------
Platform: Python 3.12.7 on Linux-5.15.0-86-generic-x86_64-with-glibc2.31
Subprojects:
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.19.0),
gdrive (pydrive2 = 1.21.1),
gs (gcsfs = 2024.10.0),
hdfs (fsspec = 2024.10.0, pyarrow = 18.0.0),
http (aiohttp = 3.10.10, aiohttp-retry = 2.9.0),
https (aiohttp = 3.10.10, aiohttp-retry = 2.9.0),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.10.0, boto3 = 1.35.36),
ssh (sshfs = 2024.9.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.10.0)
Config:
Global: /home/user/.config/dvc
System: /etc/xdg/dvc
Cache types: symlink
Cache directory: nfs on ip-addr:/storage/
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sda2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/f967073321531b0cc07fba234dd73d7b