Broken snapshotter state on SIGINT

When calling SIGINT on the daemon, it cannot be loaded into a working state without some extra effort. This is because SIGINT [triggers the cleanup flag](https://github.com/containerd/stargz-snapshotter/blob/2713db58dc69454d4ed5cffdd277560e5c52aa25/cmd/containerd-stargz-grpc/main.go#L270), which [wipes the content store](https://github.com/containerd/stargz-snapshotter/blob/2713db58dc69454d4ed5cffdd277560e5c52aa25/snapshot/snapshot.go#L413). However, the metadata is never synced to update this state. This leaves me with two questions.

First, looking through previous issues I found https://github.com/containerd/stargz-snapshotter/issues/703#issuecomment-1078560008 made note of the snapshotter removing its snapshot directories on cleanup. Could I ask for an explanation for this behavior? I would imagine that one would not want to re-pull images after restarting the daemon.

Second, could this broken state be fixed by updating the metadata before [line 662](https://github.com/containerd/stargz-snapshotter/blob/2713db58dc69454d4ed5cffdd277560e5c52aa25/snapshot/snapshot.go#L662) in `Close()` in snapshot.go? I tried to do it myself but couldn't figure out a way to get the snapshot keys. However, since cleanup wipes the container store clean regardless, as a proof of concept, I called `os.Remove()` right before line 662, like so:

```go
    if err := os.Remove(filepath.Join(o.root, "metadata.db")); err != nil {
        log.G(ctx).WithError(err).Warn("failed to wipe metadata")
    }
    return o.ms.Close()
```

This lets the daemon start in a mostly working state, but you must still manually remove and re-pull the image, as the image is still in the containerd content store. This could probably fixed by natively interacting with the db via the metadata package. Hence, this is a pretty inelegant solution IMO.

EDIT: I found that `Walk` gets all the keys, so calling that then calling `Remove` on each of them removes them from the metadata. However, it still leaves it in the containerd content store, so we still have the same issue of being unable to re-pull the image unless you remove it manually and pull again.

I can open a PR with these changes if people want this, as it removes the need to set `allow_invalid_mounts_on_restart=true` to start the daemon to remove the snapshot. It's still imperfect for the above reason but it would be a nice QOL change IMO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Broken snapshotter state on SIGINT #1440

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Broken snapshotter state on SIGINT #1440

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions