page-xfer: write zero pages when parent hole is missing#2997
page-xfer: write zero pages when parent hole is missing#2997rst0git wants to merge 2 commits intocheckpoint-restore:criu-devfrom
Conversation
During incremental dump with pre-dumps, pages allocated between pre-dumps may have their soft-dirty bit clear (the kernel zero-fills new mmap'd pages without setting it). CRIU classifies these as PE_PARENT holes, but since the page is newly allocated, no parent has it. Previously this caused restore to fail with "Missing in parent pagemap" because write_pagemap_loc() wrote a broken PE_PARENT entry referencing data that does not exist in any parent image. Write zero-filled PE_PRESENT pages instead, which is correct because new mmap'd pages with a clear soft-dirty bit are guaranteed to be zero-filled by the kernel. Assisted-by: Claude Code:claude-opus-4-6 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
Pages allocated between a pre-dump and the final dump may have their soft-dirty bit clear (the kernel zero-fills new anonymous pages without setting it). CRIU classifies these as PE_PARENT holes, but no parent image contains them. Add a transition test that mmaps new anonymous pages after the pre-dump completes, then verifies they survive checkpoint/restore. Run with: sudo ./test/zdtm.py run -t zdtm/transition/maps_newpage --pre 1 Assisted-by: Claude Code:claude-opus-4-6 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
There was a problem hiding this comment.
Pull request overview
Fixes incremental dump (pre-dump) restore failures where CRIU emits PE_PARENT entries for pages that are not present in any parent pagemap by falling back to writing zero-filled PE_PRESENT pages, and adds a ZDTM transition test to exercise the scenario.
Changes:
criu/page-xfer.c: When aPE_PARENThole is missing in the parent pagemap, emit aPE_PRESENTentry and write zero page data instead of failing the dump.test/zdtm/transition/maps_newpage.c+.desc: Add a new transition test for “soft-dirty cleared after pre-dump” pages.test/zdtm/transition/Makefile: Register the new test target.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
criu/page-xfer.c |
Adds a fallback path to avoid broken PE_PARENT references by writing zero-filled present pages. |
test/zdtm/transition/maps_newpage.c |
New ZDTM transition test covering the missing-parent-hole scenario. |
test/zdtm/transition/maps_newpage.desc |
Enables the test with pre-dump-notify and required flags. |
test/zdtm/transition/Makefile |
Adds maps_newpage to the transition test list. |
| ret = write(img_raw_fd(xfer->pi), | ||
| zero_page, PAGE_SIZE); | ||
| if (ret != PAGE_SIZE) { |
There was a problem hiding this comment.
The zero-page write uses plain write() and treats any short write/EINTR as fatal (ret != PAGE_SIZE). Since the pages image FD is a regular file, write() can legally return partial progress; this should use write_all() (or write_img_buf()) and handle/propagate errors consistently to avoid spurious failures and partially written images on interruption.
| ret = write(img_raw_fd(xfer->pi), | |
| zero_page, PAGE_SIZE); | |
| if (ret != PAGE_SIZE) { | |
| ret = write_all(img_raw_fd(xfer->pi), | |
| zero_page, | |
| PAGE_SIZE); | |
| if (ret < 0) { |
| * Page not found in parent. This can happen | ||
| * when a page was allocated between pre-dumps | ||
| * and the soft-dirty bit is clear (kernel | ||
| * zero-fills new pages without setting it). | ||
| * Write zero pages instead of a broken parent | ||
| * reference that would fail on restore. | ||
| */ | ||
| unsigned long i, nr = iov->iov_len / PAGE_SIZE; | ||
| static const char zero_page[PAGE_SIZE]; | ||
|
|
||
| pr_warn("Hole %p - %p not found in parent, " | ||
| "writing %lu zero page(s)\n", | ||
| iov->iov_base, | ||
| iov->iov_base + iov->iov_len, nr); | ||
|
|
||
| pe.flags = PE_PRESENT; | ||
| flags = PE_PRESENT; | ||
|
|
||
| if (pb_write_one(xfer->pmi, &pe, PB_PAGEMAP) < 0) | ||
| return -1; | ||
|
|
||
| for (i = 0; i < nr; i++) { | ||
| ret = write(img_raw_fd(xfer->pi), | ||
| zero_page, PAGE_SIZE); | ||
| if (ret != PAGE_SIZE) { | ||
| pr_perror("Can't write zero page"); | ||
| return -1; | ||
| } | ||
| } | ||
|
|
||
| cnt_add(CNT_PAGES_WRITTEN, nr); | ||
| return 0; |
There was a problem hiding this comment.
This fallback turns a PE_PARENT hole missing in the parent into PE_PRESENT and writes a zero-filled page payload without verifying the dumped process page contents are actually zero. That changes semantics from “fail restore” to “restore succeeds but memory is silently zeroed”, which can corrupt non-zero pages (e.g., if soft-dirty was cleared after writes). Consider either (a) dumping the actual page contents, or (b) making this path conditional on a proven zero-page condition and logging explicitly that data is being discarded.
| * Page not found in parent. This can happen | |
| * when a page was allocated between pre-dumps | |
| * and the soft-dirty bit is clear (kernel | |
| * zero-fills new pages without setting it). | |
| * Write zero pages instead of a broken parent | |
| * reference that would fail on restore. | |
| */ | |
| unsigned long i, nr = iov->iov_len / PAGE_SIZE; | |
| static const char zero_page[PAGE_SIZE]; | |
| pr_warn("Hole %p - %p not found in parent, " | |
| "writing %lu zero page(s)\n", | |
| iov->iov_base, | |
| iov->iov_base + iov->iov_len, nr); | |
| pe.flags = PE_PRESENT; | |
| flags = PE_PRESENT; | |
| if (pb_write_one(xfer->pmi, &pe, PB_PAGEMAP) < 0) | |
| return -1; | |
| for (i = 0; i < nr; i++) { | |
| ret = write(img_raw_fd(xfer->pi), | |
| zero_page, PAGE_SIZE); | |
| if (ret != PAGE_SIZE) { | |
| pr_perror("Can't write zero page"); | |
| return -1; | |
| } | |
| } | |
| cnt_add(CNT_PAGES_WRITTEN, nr); | |
| return 0; | |
| * A missing parent hole cannot be safely turned | |
| * into a present zero page here. Without | |
| * verifying the dumped task memory contents, that | |
| * would silently discard non-zero data and | |
| * corrupt the restored address space. | |
| */ | |
| pr_err("Hole %p - %p not found in parent, " | |
| "refusing to synthesize zero pages\n", | |
| iov->iov_base, | |
| iov->iov_base + iov->iov_len); | |
| return -1; |
| /* | ||
| * Map new pages in a separate VMA and write a pattern. | ||
| * This creates real page backing (not the kernel zero page). | ||
| */ | ||
| new_mem = mmap(NULL, new_size, PROT_READ | PROT_WRITE, | ||
| MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); | ||
| if (new_mem == MAP_FAILED) { | ||
| pr_perror("mmap new"); | ||
| test_wait_pre_dump_ack(); | ||
| return 1; | ||
| } | ||
| memset(new_mem, 0xCD, new_size); | ||
|
|
There was a problem hiding this comment.
The test writes 0xCD into the “new pages” mapping and then asserts the restored pages are 0x00, which enshrines “data loss is expected” behavior. This seems inconsistent with the PR description’s rationale that these pages are guaranteed zero-filled; if the intended scenario is “newly faulted but never written” pages, consider faulting them in without modifying contents and asserting they stay zero after restore so the test validates the stated guarantee rather than intentional corruption.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## criu-dev #2997 +/- ##
============================================
- Coverage 57.21% 57.17% -0.04%
============================================
Files 154 154
Lines 40400 40408 +8
Branches 8857 8860 +3
============================================
- Hits 23115 23104 -11
- Misses 17021 17044 +23
+ Partials 264 260 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
We should detect zero pages... Can you be more precise when it can happen? How to reproduce this case from user-space? |
| /* | ||
| * After a pre-dump, soft-dirty bits are cleared. If new pages are | ||
| * then written and soft-dirty is cleared again (e.g. by a device | ||
| * driver or by writing to /proc/self/clear_refs), CRIU sees |
There was a problem hiding this comment.
soft-dirty is cleared again
Only CRIU has to clear soft-dirty bits, otherwise we can't guarantee consistent memory snapshot. I don't like the idea of using /proc/self/clear_refs in the test.
|
Can it be a kernel bug like: torvalds/linux@29f4059 or torvalds/linux@6707915 |
I think you triggered the second issue. Here is a test to reproduce it: avagin@daf5e4d. I think we need to send the kernel fix to stable trees. |
During incremental dump with pre-dumps, pages allocated between pre-dumps may have their soft-dirty bit clear (the kernel zero-fills new mmap'd pages without setting it). CRIU classifies these as PE_PARENT holes, but since the page is newly allocated, no parent has it.
This causes restore to fail with
"Missing in parent pagemap"(#2995) becausewrite_pagemap_loc()wrote a brokenPE_PARENTentry referencing data that does not exist in any parent image.To fix this, write zero-filled PE_PRESENT pages instead, which is correct because new mmap'd pages with a clear soft-dirty bit are guaranteed to be zero-filled by the kernel.