cgroup: preserve cgroup2 superblock options on dump#3036
Conversation
When collecting cgroups for dump, criu opens the unified cgroup2
hierarchy with fsopen("cgroup2") followed by FSCONFIG_CMD_CREATE (and,
without fsopen support, with mount("none", ..., "cgroup2", 0, NULL)).
Neither carries any superblock option. Because cgroup2 has a single
superblock shared by every mount, this reconfigures the live host mount
and drops options that criu did not set, such as nsdelegate,
memory_recursiveprot and memory_hugetlb_accounting. The detached-mount
teardown only removes criu's own mount instance and does not restore the
superblock, so the host cgroup2 mount is left altered after a checkpoint.
Read the options of the existing cgroup2 mount from /proc/self/mountinfo
and replay the known superblock flags before FSCONFIG_CMD_CREATE (and
pass them as mount data in the fsopen-less path), so the shared
superblock keeps its options and the host mount is left intact.
Addresses the host-side mutation reported in checkpoint-restore#3029. Preserving these
options across checkpoint/restore is a separate change.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Rocker Zhang <zhang.rocker.liyuan@gmail.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## criu-dev #3036 +/- ##
============================================
+ Coverage 57.04% 57.10% +0.06%
============================================
Files 154 154
Lines 40534 40566 +32
Branches 8882 8892 +10
============================================
+ Hits 23123 23167 +44
+ Misses 17057 17045 -12
Partials 354 354 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@Snorch could you please review this pr? |
|
The problem reproduces: |
|
Parsing mountinfo is generally slow (especially host's mountinfo). Preferably we want to avoid doing this one more time (in ideal world we should be switching to listmount+statmount everywhere). What about alternative approach: We will have sb opts in Note: Need to verify that it's cgroup2 mount and not something else from other stamount fields. Yes it only works when cgroup-v2 is mounted into |
criu reconfigures the host cgroup2 mount during checkpoint, dropping its superblock options.
When collecting cgroups for dump,
__new_open_cgroupfs()opens the unified cgroup2 hierarchy withfsopen("cgroup2")and thencr_fsconfig(FSCONFIG_CMD_CREATE)(or, without fsopen support,mount("none", ..., "cgroup2", 0, NULL)), neither of which carries any superblock option. cgroup2 has a single superblock shared by every mount, so this reconfigures the live host mount and drops options criu did not set, such asnsdelegate,memory_recursiveprotandmemory_hugetlb_accounting. The detached-mount teardown only removes criu's own mount instance, so the host cgroup2 is left altered after a checkpoint (#3029).The fix reads the options of the existing cgroup2 mount from
/proc/self/mountinfoand replays the known superblock flags beforeFSCONFIG_CMD_CREATE(and passes them as mount data in the fsopen-less path), leaving the host mount intact.I reproduced this in a throwaway VM (its own kernel, so the mutated superblock is the VM's): the current sequence strips
nsdelegate,memory_recursiveprotfrom an existing cgroup2 mount, and the replay preserves them. Builds on x86_64 and aarch64.Two things I would like direction on. The fix replays an allowlist of known superblock flags; an alternative is to clone the existing mount with
open_tree(OPEN_TREE_CLONE)and avoid reconfiguring the superblock at all, which is a larger change. Separately, preserving these options across checkpoint/restore (rather than only avoiding the host mutation) would need a new field incg_controller_entry, which is an image-format change I have left out of this PR.