Skip to content

Shared cache not working for 'import-url' command  #10576

Open
@vandiaArch

Description

@vandiaArch

Bug Report

Issue name

import-url: does not set up and use the shared cache configured for the project.

Description

Importing an Azure file or folder with import-url will only use the shared cache configured for the project the first time it is used IF the --no-download flag is not present.

If the importing is done avoiding downloads (with --no-download flag) or the files downloaded and linked initially are lost (totally or partially removed) the subsequent attempts to download the files with dvc pull will not use the shared cache configured and just copy the files locally.

Reproduce

  1. dvc import-url --no-download --version-aware azure://azureTest/fileTest
  2. dvc pull fileTest.dvc

Also

  1. dvc import-url --version-aware azure://azureTest/fileTest
  2. Remove the downloaded file.
  3. dvc pull fileTest.dvc

Expected

Files go to the cache location and a symlink is created in project folder like this:

4 lrwxrwxrwx 1 user user 86 Oct 1 09:57 tinytestvideo.mp4 -> /mnt/samba/Server/Project/DVC_CACHE/files/md5/95/1ea15426585e424e9f9dfd6e1e76d3

However, this only happens if the --no-download is not present AND just the first time the file/folder is imported.

If the --no-download flag is present or the file is downloaded with dvc pull after the initial import the file is copied to the local project folder and not the shared cache.

Environment information

Ubuntu 22.04.4 LTS server

User personal folder, with a git and dvc project downloaded to their profile and cache configured as follows:

[cache]
dir = /mnt/samba/Server/Project/DVC_CACHE/
shared = group
type = "symlink,hardlink"

where the dir is a network folder mounted with Samba.

Output of dvc doctor:

DVC version: 3.53.1 (pip)
Platform: Python 3.11.7 on Linux-6.8.0-40-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 3.15.2
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.7
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.17.1),
http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3)
Config:
Global: /home/USER REDACTED/.config/dvc
System: /etc/xdg/dvc
Cache types: symlink
Cache directory: cifs on REDACTED
Caches: local
Remotes: azure, azure
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/8d7d2888c4cd9c330c3311ea57232c70

Additional Information (if any):

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-syncRelated to dvc get/fetch/import/pull/pushbugDid we break something?help wantedp1-importantImportant, aka current backlog of things to do

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions