Skip to content

gc: requires loading parameters files #7585

Open
@itcarroll

Description

@itcarroll

Bug Report

Description

When "dvc.yaml" references a parameter file in order to reproduce a foreach stage, then dvc gc --all-commits will error if the parameter file does not exist in a commit containing this "dvc.yaml" version. The use case is not too convoluted: it involves a development version of a parameters file that is not committed, but used in "dvc.yaml" during development where it is easy to accidentally commit. @pared Suggested this may be a bug if I could reproduce.

Reproduceo

Initialize git and dvc. Add the following three files:

# dvc.yaml
vars:
- config.yaml.dev
stages:
  echo:
    foreach: ${country}
    do:
      cmd: echo ${item} > ${item}.txt
      outs:
      - ${item}.txt
# config.yaml.dev
country:
- fr
# .gitignore
config.yaml.dev

Run dvc repro then git add . && git commit -m "only commit".

To generate the following error, run dvc gc --all-commits -v and answer "y" at the prompt.

% dvc gc --all-commits -v
2022-04-18 14:23:59,329 WARNING: This will remove all cache except items used in the workspace and all git commits of the current repo.
Are you sure you want to proceed? [y/n]: y
2022-04-18 14:24:01,871 ERROR: failed to parse 'vars' in 'dvc.yaml': 'config.yaml.dev' does not exist: 'config.yaml.dev' does not exist
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/__init__.py", line 147, in __init__
    self.context.load_from_vars(*args, default=DEFAULT_PARAMS_FILE)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/context.py", line 456, in load_from_vars
    self.merge_from(fs, item, wdir)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/context.py", line 408, in merge_from
    ctx = Context.load_from(fs, abspath, select_keys)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/context.py", line 364, in load_from
    raise ParamsLoadError(f"'{file}' does not exist")
dvc.parsing.context.ParamsLoadError: 'config.yaml.dev' does not exist

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/cli/__init__.py", line 89, in main
    ret = cmd.do_run()
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/commands/gc.py", line 51, in run
    self.repo.gc(
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/gc.py", line 61, in gc
    for obj_ids in repo.used_objs(
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/__init__.py", line 420, in used_objs
    for odb, objs in self.index.used_objs(
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/index.py", line 218, in used_objs
    for stage, filter_info in pairs:
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/index.py", line 212, in <genexpr>
    self.stage_collector.collect_granular(
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 401, in collect_granular
    return [StageInfo(stage) for stage in self.repo.index]
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 401, in <listcomp>
    return [StageInfo(stage) for stage in self.repo.index]
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/index.py", line 92, in __iter__
    yield from self.stages
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/funcy/objects.py", line 28, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/index.py", line 74, in stages
    return self.stage_collector.collect_repo(onerror=onerror)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 497, in collect_repo
    return list(self._collect_repo(onerror))
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 480, in _collect_repo
    new_stages = self.load_file(file_path)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 304, in load_file
    return self.load_all(path)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 284, in load_all
    return [stages[key] for key in keys]
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 284, in <listcomp>
    return [stages[key] for key in keys]
  File "/usr/local/Cellar/[email protected]/3.10.2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/_collections_abc.py", line 878, in __iter__
    yield from self._mapping
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/stage/loader.py", line 152, in __iter__
    return iter(self.resolver.get_keys())
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/funcy/objects.py", line 28, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/stage/loader.py", line 39, in resolver
    return DataResolver(self.repo, wdir, self.data)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/__init__.py", line 149, in __init__
    format_and_raise(exc, "'vars'", self.relpath)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/__init__.py", line 80, in format_and_raise
    _reraise_err(ResolveError, message, from_exc=exc)
  File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/__init__.py", line 88, in _reraise_err
    raise err from from_exc
dvc.parsing.ResolveError: failed to parse 'vars' in 'dvc.yaml': 'config.yaml.dev' does not exist
------------------------------------------------------------
2022-04-18 14:24:01,881 DEBUG: Analytics is enabled.
2022-04-18 14:24:01,984 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/rb/ncv61h4j0c38v61pc_365rx40000gp/T/tmpqm3_769t']'
2022-04-18 14:24:01,986 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/rb/ncv61h4j0c38v61pc_365rx40000gp/T/tmpqm3_769t']'

Expected

It is not reasonable for dvc gc --all-commits to require a parameter file referenced from a committed dvc.yaml, when all it should need is a dvc.lock. It introduces the situation above where an accidental commit of a reference to a temporary parameters file permanently breaks the ability to garbage collect with --all-commits.

Environment information

Output of dvc doctor:

DVC version: 2.10.1 (pip)
---------------------------------
Platform: Python 3.10.2 on macOS-11.6.5-x86_64-i386-64bit
Supports:
	webhdfs (fsspec = 2022.3.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: gcRelated go garbage collectionbugDid we break something?

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions