bfd: Support dynamic expansion for massively long text lines#2900
bfd: Support dynamic expansion for massively long text lines#2900DongSunchao wants to merge 2 commits intocheckpoint-restore:criu-devfrom
Conversation
ab06539 to
a81257f
Compare
a81257f to
3c3502b
Compare
Snorch
left a comment
There was a problem hiding this comment.
I added small improvement comments, please take a look. But generally this looks good to me.
9cf3a53 to
3a7c9b2
Compare
| @@ -0,0 +1 @@ | |||
| {'flavor': 'h', 'flags': 'suid'} No newline at end of file | |||
There was a problem hiding this comment.
why is it running only in the host namespace?
There was a problem hiding this comment.
The main reason is that this test will call setgroups() with 900, extremely large GIDs.
If we run it without 'flavor': 'h' (in a user namespace), it requires a massive and complex gid_map setup for all those custom groups. The ZDTM sandbox initialization actually fails under these conditions (I encountered a mount(/proc) failed error when I tried running it in ns).
There was a problem hiding this comment.
The ZDTM sandbox initialization actually fails under these conditions (I encountered a mount(/proc) failed error when I tried running it in ns).
Have you tried to investigate why it fails?
3a7c9b2 to
4570c0c
Compare
There was a problem hiding this comment.
Pull request overview
Adds support for dynamically expanding CRIU’s bfd read buffer when parsing unusually long text lines (e.g., /proc/<pid>/status Groups:) without permanently inflating the global fixed-size buffer pool, and introduces a ZDTM regression test that triggers this behavior.
Changes:
- Track per-buffer capacity in
criu/bfd.cand dynamicallymmap()a larger buffer (up to 2MB) whenbreadchr()hits the current capacity. - Ensure oversized buffers are
munmap()’d/xfree()’d instead of being returned to the fixed BUFSIZE pool. - Add a new ZDTM static test (
massive_groups) to generate a longGroups:line and validate multiple consecutive expansions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
criu/bfd.c |
Implements dynamic buffer growth for long lines and ensures oversized buffers don’t pollute the fixed-size pool. |
test/zdtm/static/massive_groups.c |
New ZDTM test that sets many supplementary groups and validates they survive dump/restore. |
test/zdtm/static/massive_groups.desc |
Registers the test with required flags/flavor. |
test/zdtm/static/Makefile |
Adds the new test to the static test build lists/flags. |
Comments suppressed due to low confidence (6)
criu/bfd.c:252
- When returning the original standard buffer to the pool during expansion, this uses
list_add_tail(). Combined withbuf_get()consuming from the head, this changes reuse semantics compared to the previous LIFO behavior and may contradict the intent that a buffer is reused “by next bfdopen call”. Consider usinglist_add()here (and inbuf_put()) to keep reuse order consistent.
/*
* Don't unmap standard buffer back, it will get reused
* by next bfdopen call
*/
list_add_tail(&b_buf->l, &bufs);
}
test/zdtm/static/massive_groups.c:17
- The header comment is internally inconsistent/misleading: it says “set UID”, but this test only manipulates supplementary groups via
setgroups(). Please reword this comment to match what the code actually does (and fix the grammar while you’re there).
/*
* _SC_NGROUPS_MAX beyond parasite's maximum supported groups, so we use a hardcoded value.
* Use 900 groups: enough to test the "Groups:" line in /proc/pid/status
* and triggering dynamic expansion,
* but still within the PARASITE_MAX_GROUPS limit (~900).
* So we use the larger value to set UID and test the dynamic expansion of BFD buffer when dumping the process.
*/
test/zdtm/static/massive_groups.c:68
- The
fail()format string uses%dto printgid_tvalues.gid_tis not guaranteed to beint(often it’sunsigned int), so this can be undefined behavior or print incorrect values. Cast to a known type and use the matching format specifier (or avoid printing the rawgid_tdirectly).
if (restored_groups[i] != group[i]) {
fail("Restored group ID at index %d (%d) does not match expected (%d)", i, restored_groups[i], group[i]);
free(restored_groups);
criu/bfd.c:29
BFD_MAX_MREMAP_SIZEis a misleading name here: the implementation usesmmap()+memcpy()rather thanmremap(). Consider renaming this constant to something that reflects its purpose (e.g., max dynamic/oversized bfd buffer size) to avoid confusion for future maintainers.
#define BFD_MAX_MREMAP_SIZE (2 * 1024 * 1024)
criu/bfd.c:96
buf_put()now useslist_add_tail(), but the comment says the standard buffer will be reused by the nextbfdopencall. Sincebuf_get()takes from the list head, adding to the tail makes reuse FIFO and the just-returned buffer likely won’t be the next one reused. Either switch back tolist_add()(LIFO reuse) or adjust the comment and any expectations about reuse order.
/*
* Don't unmap standard buffer back, it will get reused
* by next bfdopen call
*/
list_add_tail(&b->l, &bufs);
}
criu/bfd.c:223
- The hard-coded error text "Line too long to fit in 2M buffer" will become inaccurate if
BFD_MAX_MREMAP_SIZEchanges and also doesn’t report the actual limit encountered. Consider formatting the message using the limit constant (and ideally the offending line length/cap) so logs stay actionable.
if (new_cap > BFD_MAX_MREMAP_SIZE) {
pr_err("Line too long to fit in 2M buffer\n");
return ERR_PTR(-EIO);
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
test/zdtm/static/massive_groups.c:86
fail()printsgid_tvalues using%u, butgid_tis not guaranteed to beunsigned inton all architectures/libc configurations. This can trigger -Wformat warnings or incorrect output. Consider casting to a known type (e.g.,(unsigned int)) or using a format specifier that matchesgid_tconsistently.
if (restored_groups[i] != group[i]) {
fail("Restored group ID at index %d (%u) does not match expected (%u)", i, restored_groups[i], group[i]);
free(restored_groups);
criu/bfd.c:245
- The buffer-release logic here (unmap+free for oversized buffers vs returning to the pool for standard buffers) is duplicated with
buf_put(). To reduce the chance of future inconsistencies, consider extracting a small helper that releases astruct bfd_buf *and use it from bothbreadchr()andbuf_put().
if (b_buf) {
if (b_buf->size > BUFSIZE) {
/* This buffer was remapped to fit a long line, unmap it back */
munmap(b_buf->mem, b_buf->size);
xfree(b_buf);
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## criu-dev #2900 +/- ##
============================================
- Coverage 57.76% 57.19% -0.58%
============================================
Files 142 154 +12
Lines 37664 40377 +2713
Branches 0 8851 +8851
============================================
+ Hits 21758 23095 +1337
- Misses 15906 17018 +1112
- Partials 0 264 +264 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fce7031 to
175da22
Compare
|
@DongSunchao what do you think about this draft? |
Currently, bfd has a fixed buffer size (BUFSIZE, which is 4096). This causes issues when reading lines longer than BUFSIZE, as breadline fails with "The bfd buffer is too small". This patch introduces dynamic buffer resizing in bfd. When a buffer is full and more space is needed (e.g., for a very long line), the buffer is resized using mremap (or a new mmap if it was using a pre-allocated buffer from the pool). A new bsize field is added to struct xbuf to keep track of the current buffer size. Unit tests for reading long lines and writing large buffers are added to criu/unittest/unit.c. Signed-off-by: dong sunchao <[email protected]> Co-developed-by: Andrei Vagin <[email protected]> Signed-off-by: Andrei Vagin <[email protected]>
Add a new ZDTM test case `massive_groups` to verify the dynamic buffer expansion logic in `bfd.c` Signed-off-by: dong sunchao <[email protected]>
Fixes #2898
Motivation
Currently,
breadchr()aborts with "The bfd buffer is too small" when encountering a process with a massive number of supplementary groups, because theGroups:line in/proc/<pid>/statusexceeds the hardcodedBUFSIZE(4096 bytes).While increasing
BUFSIZEglobally (e.g., toPAGE_SIZE * 16) works as a quick workaround, it is not optimal. Due to Linux's demand paging, while this only reserves virtual memory initially, recycling a dirtied, globally enlarged buffer back into the standard pool vialist_add_tail()causes severe internal fragmentation (RSS bloat). Those physical pages would remain resident in memory indefinitely even when reused for tiny files.Implementation Details
This PR implements a strict dynamic expansion mechanism in
bfd.c:mmappool (BUFSIZE) intact for 99% of normal routine operations.breadchr()hits the buffer boundary, it dynamically maps an independent, doubled VMA to handle the long text line (capped at 2MB to prevent OOM).munmap) these oversized custom buffers in bothbuf_put()and during consecutive re-expansions inbreadchr(), strictly preventing them from polluting the fixed-size 4KB memory pool.massive_groups) using 900 groups with 10-digit GIDs (e.g.,1000000000). This generates a ~10KBGroups:line, intentionally forcing multiple consecutive dynamic expansions (4K -> 8K -> 16K) to thoroughly validate the expansion and safe-discard logic, while safely staying below thePARASITE_MAX_GROUPSlimit.Tested locally with
make zdtmand successfully passed themassive_groupstest in the host namespace.