Skip to content

fetch: fails when using url.insteadOf in git config #9535

Open
@sjawhar

Description

@sjawhar

Bug Report

Description

I'm using a dvc imported asset in a project. In a certain environment, I'm using url.insteadOf to replace the URL of the repo from which the asset is imported. In my particular case, I'm replacing an SSH url wth a path URL. However, the clone of that remote repo fails right here:

dvc/dvc/scm.py

Lines 160 to 162 in 6ace5ed

git = Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
if "shallow_branch" not in kwargs:
fetch_all_exps(git, url, progress=pbar.update_git)

The first call to Git.clone() succeeds as the URL is properly replaced. However, in the call to fetch_all_exps, the value of url being provided is NOT the replaced one, which is stored in the cloned repo's config as the URL of the remote branch. And so the fetch fails. Also potentially relevant section is in scmrepo.git.backend.dulwich.iter_remote_refs():

https://github.com/iterative/scmrepo/blob/f70e0323746f22833581c26efbfcccb285ddb845/src/scmrepo/git/backend/dulwich/__init__.py#L492-L500

It's possible this behavior should be handled by the upstream packages (scmrepo or dulwich), but I'm starting the discussion here.

Reproduce

  1. dvc import an asset from any project using SSH URL (or just make a dummy .dvc file)
  2. Clone that same repo to e.g. /tmp/remote
  3. get config --global url./tmp/remote.insteadOf ${SSH_URL}
  4. SSH_AUTH_SOCK= dvc pull asset.dvc (SSH_AUTH_SOCK here is just an example if you're using ssh-agent. The point is to do this in an env without creds for SSH access.)

Expected

The pull should succeed!

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.58.2 (pip)
-------------------------
Platform: Python 3.8.16 on Linux-6.2.6-76060206-generic-x86_64-with-glibc2.2.5
Subprojects:
        dvc_data = 0.51.0
        dvc_objects = 0.22.0
        dvc_render = 0.3.1
        dvc_task = 0.2.1
        scmrepo = 1.0.2
Supports:
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.3.0, boto3 = 1.24.59),
        ssh (sshfs = 2023.4.1)
Config:
        Global: /home/kernel/.config/dvc
        System: /etc/xdg/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: s3, ssh
Workspace directory: overlay on overlay
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/a6c21da1aef04b4fdc4a48db8508fea3

Additional Information (if any):
The stack trace is very long, but I think I've pointed out the relevant sections above.

Related, but another way to solve my problem would be if one could use remote://${remote_name} as the repo.url in .dvc files, as then the URL could be overridden by DVC configs at the very beginning. Open to anything that solves this problem, including opening a PR myself with the preferred approach :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugDid we break something?gitRelated to git and git backends

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions