Skip to content

Import ignore cache #10657

Open
Open
@konstantin-frolov

Description

@konstantin-frolov

Bug Report

DVC 3.56 Import ignore cache

Description

I have local DVC repo with json annotations added each one and large data storage with thousand image files added as full folder.
I use symlinks for cache.
But import in external storage doesn't create symlinks for images data storage. DVC download first, than link files.

Reproduce

Local data repo

Config

cache.type=symlink
core.autostage=true

Local storage dirs:

annotations/
     master_annotation.json
     train_annotation.json
     test_annotations.json
data_storage/
     image_0
     image_1
     ...
     image_N

In local storage comands

dvc add ./annotations/*
dvc add ./data_storage
Project repo

Config

cache.type=symlink
cache.dir=path/to/local/data/repo/.dvc/cache
core.autostage=true

commands:

dvc import path/to/local/data/repo data_storage

This command start downloading copies files from cache

dvc import path/to/local/data/repo data_storage --no-download

Check data_storage.dvc file and create it in project repo, but

dvc checkout data_storage.dvc

or

dvc checkout data_storage.dvc --relink

start downloading files again

Expected

I think DVC must create symlink for files without downloading originals

Environment information

Output of dvc doctor in local data repo:

-------------------------
Platform: Python 3.12.7 on Linux-5.15.0-86-generic-x86_64-with-glibc2.31
Subprojects:

Supports:
        azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.19.0),
        gdrive (pydrive2 = 1.21.1),
        gs (gcsfs = 2024.10.0),
        hdfs (fsspec = 2024.10.0, pyarrow = 18.0.0),
        http (aiohttp = 3.10.10, aiohttp-retry = 2.9.0),
        https (aiohttp = 3.10.10, aiohttp-retry = 2.9.0),
        oss (ossfs = 2023.12.0),
        s3 (s3fs = 2024.10.0, boto3 = 1.35.36),
        ssh (sshfs = 2024.9.0),
        webdav (webdav4 = 0.10.0),
        webdavs (webdav4 = 0.10.0),
        webhdfs (fsspec = 2024.10.0)
Config:
        Global: /home/user/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: nfs on ip-addr:/storage/
Caches: local
Remotes: None
Workspace directory: nfs on [ip-addr:/storage/](ip-addr:/storage/)
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/76de345055c7e5635fd954ee44e5d4e2

Output of dvc doctor in project repo:

DVC version: 3.56.0 (deb)
-------------------------
Platform: Python 3.12.7 on Linux-5.15.0-86-generic-x86_64-with-glibc2.31
Subprojects:

Supports:
        azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.19.0),
        gdrive (pydrive2 = 1.21.1),
        gs (gcsfs = 2024.10.0),
        hdfs (fsspec = 2024.10.0, pyarrow = 18.0.0),
        http (aiohttp = 3.10.10, aiohttp-retry = 2.9.0),
        https (aiohttp = 3.10.10, aiohttp-retry = 2.9.0),
        oss (ossfs = 2023.12.0),
        s3 (s3fs = 2024.10.0, boto3 = 1.35.36),
        ssh (sshfs = 2024.9.0),
        webdav (webdav4 = 0.10.0),
        webdavs (webdav4 = 0.10.0),
        webhdfs (fsspec = 2024.10.0)
Config:
        Global: /home/user/.config/dvc
        System: /etc/xdg/dvc
Cache types: symlink
Cache directory: nfs on ip-addr:/storage/
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sda2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/f967073321531b0cc07fba234dd73d7b

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-syncRelated to dvc get/fetch/import/pull/pushbugDid we break something?p1-importantImportant, aka current backlog of things to do

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions