fix ssh fsspec: make put atomic

I am facing some problems working with a ssh-remote. We are only using dvc for dataset versioning.
We have little control over the remotes configuration and often when I push from local cache to remote, files arrive on remote corrupted. Multiple executions of `dvc push` do not fix the issue.
`dvc status -c` does not notice the corruption on remote even though the remote contains corrupted files and the local cache intact ones.
The problem is not noticed when working alone, since dvc does not detect the corruption on remote, thinks remote and local cache are in sync and `dvc pull` takes the files from the local cache.
The problem arises when a third party tries to pull the dataset.
- When verification is turned off (in .dvc/config) dvc happily pulls the corrupted files,
- if it is turned on, dvc fetch tells me the corrupted files are getting fetched to local cache, but they actually are not, as subsequent runs of `dvc status -c` reveal (all corrupt files from remote appear as deleted). `dvc checkout` (and `dvc pull`) subsequently fails, I think because of the mismatch between remote and local cache.

We are using `dvc repro` instead of manually adding files, so fixing the problem by manually untracking and readding the files is off the table as far as I understand.

Shouldn't there be an option to check for corruption when pushing files to remote? Something like a `--verify` option for `dvc push`.

Right now, verification is only happening during pulling, at which point the corrupted files cannot be automatically fixed by re-pulling/pushing anymore. Also, the error we get when trying to pull from the corrupted remote (without intact local cache) with verification turned on is not very helpful. It just asks if the cache is up to date. The cache is up to date, the remote is the thing causing problems.
We are currently testing a workaround, running rsync after each push. Since there are no dvc hooks (like there are git hooks), I do not see an elegant way of automating this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix ssh fsspec: make put atomic #10498

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix ssh fsspec: make put atomic #10498

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions