Description
Bug Report
Description
When "dvc.yaml" references a parameter file in order to reproduce a foreach stage, then dvc gc --all-commits
will error if the parameter file does not exist in a commit containing this "dvc.yaml" version. The use case is not too convoluted: it involves a development version of a parameters file that is not committed, but used in "dvc.yaml" during development where it is easy to accidentally commit. @pared Suggested this may be a bug if I could reproduce.
Reproduceo
Initialize git and dvc. Add the following three files:
# dvc.yaml
vars:
- config.yaml.dev
stages:
echo:
foreach: ${country}
do:
cmd: echo ${item} > ${item}.txt
outs:
- ${item}.txt
# config.yaml.dev
country:
- fr
# .gitignore
config.yaml.dev
Run dvc repro
then git add . && git commit -m "only commit"
.
To generate the following error, run dvc gc --all-commits -v
and answer "y" at the prompt.
% dvc gc --all-commits -v
2022-04-18 14:23:59,329 WARNING: This will remove all cache except items used in the workspace and all git commits of the current repo.
Are you sure you want to proceed? [y/n]: y
2022-04-18 14:24:01,871 ERROR: failed to parse 'vars' in 'dvc.yaml': 'config.yaml.dev' does not exist: 'config.yaml.dev' does not exist
------------------------------------------------------------
Traceback (most recent call last):
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/__init__.py", line 147, in __init__
self.context.load_from_vars(*args, default=DEFAULT_PARAMS_FILE)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/context.py", line 456, in load_from_vars
self.merge_from(fs, item, wdir)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/context.py", line 408, in merge_from
ctx = Context.load_from(fs, abspath, select_keys)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/context.py", line 364, in load_from
raise ParamsLoadError(f"'{file}' does not exist")
dvc.parsing.context.ParamsLoadError: 'config.yaml.dev' does not exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/cli/__init__.py", line 89, in main
ret = cmd.do_run()
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/commands/gc.py", line 51, in run
self.repo.gc(
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/__init__.py", line 48, in wrapper
return f(repo, *args, **kwargs)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/gc.py", line 61, in gc
for obj_ids in repo.used_objs(
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/__init__.py", line 420, in used_objs
for odb, objs in self.index.used_objs(
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/index.py", line 218, in used_objs
for stage, filter_info in pairs:
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/index.py", line 212, in <genexpr>
self.stage_collector.collect_granular(
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 401, in collect_granular
return [StageInfo(stage) for stage in self.repo.index]
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 401, in <listcomp>
return [StageInfo(stage) for stage in self.repo.index]
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/index.py", line 92, in __iter__
yield from self.stages
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/funcy/objects.py", line 28, in __get__
res = instance.__dict__[self.fget.__name__] = self.fget(instance)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/index.py", line 74, in stages
return self.stage_collector.collect_repo(onerror=onerror)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 497, in collect_repo
return list(self._collect_repo(onerror))
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 480, in _collect_repo
new_stages = self.load_file(file_path)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 304, in load_file
return self.load_all(path)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 284, in load_all
return [stages[key] for key in keys]
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/repo/stage.py", line 284, in <listcomp>
return [stages[key] for key in keys]
File "/usr/local/Cellar/[email protected]/3.10.2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/_collections_abc.py", line 878, in __iter__
yield from self._mapping
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/stage/loader.py", line 152, in __iter__
return iter(self.resolver.get_keys())
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/funcy/objects.py", line 28, in __get__
res = instance.__dict__[self.fget.__name__] = self.fget(instance)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/stage/loader.py", line 39, in resolver
return DataResolver(self.repo, wdir, self.data)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/__init__.py", line 149, in __init__
format_and_raise(exc, "'vars'", self.relpath)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/__init__.py", line 80, in format_and_raise
_reraise_err(ResolveError, message, from_exc=exc)
File "/Users/icarroll/Library/Caches/pypoetry/virtualenvs/tmp-HwFH0Yo_-py3.10/lib/python3.10/site-packages/dvc/parsing/__init__.py", line 88, in _reraise_err
raise err from from_exc
dvc.parsing.ResolveError: failed to parse 'vars' in 'dvc.yaml': 'config.yaml.dev' does not exist
------------------------------------------------------------
2022-04-18 14:24:01,881 DEBUG: Analytics is enabled.
2022-04-18 14:24:01,984 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/rb/ncv61h4j0c38v61pc_365rx40000gp/T/tmpqm3_769t']'
2022-04-18 14:24:01,986 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/rb/ncv61h4j0c38v61pc_365rx40000gp/T/tmpqm3_769t']'
Expected
It is not reasonable for dvc gc --all-commits
to require a parameter file referenced from a committed dvc.yaml
, when all it should need is a dvc.lock
. It introduces the situation above where an accidental commit of a reference to a temporary parameters file permanently breaks the ability to garbage collect with --all-commits
.
Environment information
Output of dvc doctor
:
DVC version: 2.10.1 (pip)
---------------------------------
Platform: Python 3.10.2 on macOS-11.6.5-x86_64-i386-64bit
Supports:
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git