Skip to content

Commit 700cb92

Browse files
authored
fix: properly reload persistent snapshotter data and restart services (#767)
Issue #, if available: re-verified #412 - Through extensive e2e test debugging, I noticed that soci and stargz snapshotters weren't persisting data as expected. After debugging, I found some context in these two PRs: - awslabs/soci-snapshotter#881 - containerd/stargz-snapshotter#1526 Unfortunately, neither of them are deployed yet, so I've implemented a hacky workaround for now. After this change, an image/container can be pull/run, the VM can be restarted, and then the container can be re-started again. *Description of changes:* - Redo how BuildKit/Stargz/SOCI are related to containerd using [systemd's `PartOf` ](https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#PartOf=) - this ensures that all of these services are restarted when containerd is restarted, which the lack of has caused errors in the past - Create some missing directories that might throw errors in cloud-init - Ensure that `SIGTERM` is used to kill the snapshotter services for now *Testing done:* - manual testing - [x] I've reviewed the guidance in CONTRIBUTING.md #### License Acceptance By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Signed-off-by: Justin Alvarez <[email protected]>
1 parent 673c2a5 commit 700cb92

File tree

3 files changed

+51
-10
lines changed

3 files changed

+51
-10
lines changed

finch.windows.yaml

+25-5
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ provision:
6262
6363
# https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L144-L146
6464
# XDG_DATA_HOME & ~/.local/share: https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L51
65-
mkdir ~/.local/share/containerd
65+
mkdir -p ~/.local/share/containerd
6666
sudo mount --bind /mnt/lima-finch/containerd ~/.local/share/containerd
6767
6868
# https://github.com/containerd/nerdctl/blob/main/docs/dir.md#dataroot
@@ -78,13 +78,33 @@ provision:
7878
sudo mount --bind /mnt/lima-finch/cni-config ~/.config/cni
7979
8080
# https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L148-L150
81-
sudo mkdir -p /mnt/lima-finch/cni
81+
sudo mkdir -p /mnt/lima-finch/cni /var/lib/cni
8282
sudo mount --bind /mnt/lima-finch/cni /var/lib/cni
8383
mkdir -p ~/.local/share/cni
84-
sudo mount --bind /mnt/lima-finch/cni ~/.local/share/cni
84+
sudo mount --bind /mnt/lima-finch/cni ~/.local/share/cni
85+
86+
# https://github.com/containerd/stargz-snapshotter/blob/94b12086ace4119e86d2db0d6343d7c734b56671/cmd/containerd-stargz-grpc/main.go#L67C2-L67C2
87+
sudo mkdir -p /mnt/lima-finch/containerd-stargz-grpc/snapshotter/snapshots
88+
sudo mount --bind /mnt/lima-finch/containerd-stargz-grpc /var/lib/containerd-stargz-grpc
89+
90+
# https://github.com/awslabs/soci-snapshotter/blob/335515f746f50c964ed48159257e1aeba04805b6/cmd/soci-snapshotter-grpc/main.go#L84
91+
sudo mkdir -p /mnt/lima-finch/soci-snapshotter-grpc/snapshotter/snapshots /var/lib/soci-snapshotter-grpc
92+
sudo mount --bind /mnt/lima-finch/soci-snapshotter-grpc /var/lib/soci-snapshotter-grpc
93+
94+
# Make sure stargz and buildkit are restarted with containerd
95+
sudo mkdir -p /usr/local/lib/systemd/system/buildkit.service.d/
96+
printf '[Unit]\nPartOf=containerd.service\n' | sudo tee /usr/local/lib/systemd/system/buildkit.service.d/finch.conf
97+
sudo mkdir -p /usr/local/lib/systemd/system/stargz-snapshotter.service.d/
98+
printf '[Unit]\nPartOf=containerd.service\n\n[Service]\nKillSignal=SIGTERM\n' | sudo tee /usr/local/lib/systemd/system/stargz-snapshotter.service.d/finch.conf
99+
100+
# Add a new services that syncs the filesystem before shutdown
101+
printf '[Unit]\nDescription=Sync containerd on shutdown\nDefaultDependencies=no\nBefore=shutdown.target reboot.target halt.target kexec.target\n\n[Service]\nType=oneshot\nExecStart=/bin/bash -c "sync /var/lib/containerd"\n\n[Install]\nWantedBy=halt.target reboot.target shutdown.target kexec.target\n' | sudo tee /usr/local/lib/systemd/system/finch-sync-on-shutdown.service
102+
sudo systemctl enable --now finch-sync-on-shutdown.service
103+
104+
# Add a new service that cleans up lingering CNI networks on boot
105+
printf '[Unit]\nDescription=Delete hanging data on boot\nDefaultDependencies=no\nBefore=basic.target\n\n[Service]\nType=oneshot\nExecStart=/bin/bash -c "sudo rm /var/lib/cni/networks/bridge/**; sudo rm /var/lib/cni/results/bridge-finch-*"\n\n[Install]\nWantedBy=basic.target\n' | sudo tee /usr/local/lib/systemd/system/finch-cleanup-on-boot.service
106+
sudo systemctl enable --now finch-cleanup-on-boot.service
85107
86-
# Make sure buildkit is restarted with containerd, so it uses the correct UUID
87-
sudo systemctl add-requires buildkit.service containerd.service
88108
sudo systemctl restart containerd.service
89109
90110
env:

finch.yaml

+24-4
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ provision:
169169
170170
# https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L144-L146
171171
# XDG_DATA_HOME & ~/.local/share: https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L51
172-
mkdir ~/.local/share/containerd
172+
mkdir -p ~/.local/share/containerd
173173
sudo mount --bind /mnt/lima-finch/containerd ~/.local/share/containerd
174174
175175
# https://github.com/containerd/nerdctl/blob/main/docs/dir.md#dataroot
@@ -185,13 +185,33 @@ provision:
185185
sudo mount --bind /mnt/lima-finch/cni-config ~/.config/cni
186186
187187
# https://github.com/containerd/nerdctl/blob/cffdf87ff4d648a5344eea1406bb95ca3ad7eaa4/extras/rootless/containerd-rootless.sh#L148-L150
188-
sudo mkdir -p /mnt/lima-finch/cni
188+
sudo mkdir -p /mnt/lima-finch/cni /var/lib/cni
189189
sudo mount --bind /mnt/lima-finch/cni /var/lib/cni
190190
mkdir -p ~/.local/share/cni
191191
sudo mount --bind /mnt/lima-finch/cni ~/.local/share/cni
192192
193-
# Make sure buildkit is restarted with containerd, so it uses the correct UUID
194-
sudo systemctl add-requires buildkit.service containerd.service
193+
# https://github.com/containerd/stargz-snapshotter/blob/94b12086ace4119e86d2db0d6343d7c734b56671/cmd/containerd-stargz-grpc/main.go#L67C2-L67C2
194+
sudo mkdir -p /mnt/lima-finch/containerd-stargz-grpc/snapshotter/snapshots
195+
sudo mount --bind /mnt/lima-finch/containerd-stargz-grpc /var/lib/containerd-stargz-grpc
196+
197+
# https://github.com/awslabs/soci-snapshotter/blob/335515f746f50c964ed48159257e1aeba04805b6/cmd/soci-snapshotter-grpc/main.go#L84
198+
sudo mkdir -p /mnt/lima-finch/soci-snapshotter-grpc/snapshotter/snapshots /var/lib/soci-snapshotter-grpc
199+
sudo mount --bind /mnt/lima-finch/soci-snapshotter-grpc /var/lib/soci-snapshotter-grpc
200+
201+
# Make sure stargz and buildkit are restarted with containerd
202+
sudo mkdir -p /usr/local/lib/systemd/system/buildkit.service.d/
203+
printf '[Unit]\nPartOf=containerd.service\n' | sudo tee /usr/local/lib/systemd/system/buildkit.service.d/finch.conf
204+
sudo mkdir -p /usr/local/lib/systemd/system/stargz-snapshotter.service.d/
205+
printf '[Unit]\nPartOf=containerd.service\n\n[Service]\nKillSignal=SIGTERM\n' | sudo tee /usr/local/lib/systemd/system/stargz-snapshotter.service.d/finch.conf
206+
207+
# Add a new services that syncs the filesystem before shutdown
208+
printf '[Unit]\nDescription=Sync containerd on shutdown\nDefaultDependencies=no\nBefore=shutdown.target reboot.target halt.target kexec.target\n\n[Service]\nType=oneshot\nExecStart=/bin/bash -c "sync /var/lib/containerd"\n\n[Install]\nWantedBy=halt.target reboot.target shutdown.target kexec.target\n' | sudo tee /usr/local/lib/systemd/system/finch-sync-on-shutdown.service
209+
sudo systemctl enable --now finch-sync-on-shutdown.service
210+
211+
# Add a new service that cleans up lingering CNI networks on boot
212+
printf '[Unit]\nDescription=Delete hanging data on boot\nDefaultDependencies=no\nBefore=basic.target\n\n[Service]\nType=oneshot\nExecStart=/bin/bash -c "sudo rm /var/lib/cni/networks/bridge/**; sudo rm /var/lib/cni/results/bridge-finch-*"\n\n[Install]\nWantedBy=basic.target\n' | sudo tee /usr/local/lib/systemd/system/finch-cleanup-on-boot.service
213+
sudo systemctl enable --now finch-cleanup-on-boot.service
214+
195215
sudo systemctl restart containerd.service
196216
197217
# Probe scripts to check readiness.

pkg/config/lima_config_applier.go

+2-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,8 @@ if [ ! -f /usr/local/bin/soci ]; then
3939
ln -s /usr/local/lib/systemd/system/soci-snapshotter.service /etc/systemd/system/multi-user.target.wants/
4040
restorecon -v /usr/local/lib/systemd/system/soci-snapshotter.service
4141
systemctl daemon-reload
42-
sudo systemctl add-requires soci-snapshotter.service containerd.service
42+
sudo mkdir -p /usr/local/lib/systemd/system/soci-snapshotter.service.d/
43+
printf '[Unit]\nPartOf=containerd.service\n\n[Service]\nKillSignal=SIGTERM\n' | sudo tee /usr/local/lib/systemd/system/soci-snapshotter.service.d/finch.conf
4344
systemctl enable --now soci-snapshotter
4445
fi
4546

0 commit comments

Comments
 (0)