bfd: Support dynamic expansion for massively long text lines by DongSunchao · Pull Request #2900 · checkpoint-restore/criu

DongSunchao · 2026-02-20T09:34:36Z

Motivation
Currently, breadchr() aborts with "The bfd buffer is too small" when encountering a process with a massive number of supplementary groups, because the Groups: line in /proc/<pid>/status exceeds the hardcoded BUFSIZE (4096 bytes).

While increasing BUFSIZE globally (e.g., to PAGE_SIZE * 16) works as a quick workaround, it is not optimal. Due to Linux's demand paging, while this only reserves virtual memory initially, recycling a dirtied, globally enlarged buffer back into the standard pool via list_add_tail() causes severe internal fragmentation (RSS bloat). Those physical pages would remain resident in memory indefinitely even when reused for tiny files.

Implementation Details
This PR implements a strict dynamic expansion mechanism in bfd.c:

Keeps the default zero-copy batched mmap pool (BUFSIZE) intact for 99% of normal routine operations.
When breadchr() hits the buffer boundary, it dynamically maps an independent, doubled VMA to handle the long text line (capped at 2MB to prevent OOM).
Explicitly intercepts and safely unmaps (munmap) these oversized custom buffers in both buf_put() and during consecutive re-expansions in breadchr(), strictly preventing them from polluting the fixed-size 4KB memory pool.
Included a new ZDTM test (massive_groups) using 900 groups with 10-digit GIDs (e.g., 1000000000). This generates a ~10KB Groups: line, intentionally forcing multiple consecutive dynamic expansions (4K -> 8K -> 16K) to thoroughly validate the expansion and safe-discard logic, while safely staying below the PARASITE_MAX_GROUPS limit.

Tested locally with make zdtm and successfully passed the massive_groups test in the host namespace.

Snorch

I added small improvement comments, please take a look. But generally this looks good to me.

avagin · 2026-02-22T00:13:30Z

@@ -0,0 +1 @@
+{'flavor': 'h', 'flags': 'suid'}


why is it running only in the host namespace?

The main reason is that this test will call setgroups() with 900, extremely large GIDs.
If we run it without 'flavor': 'h' (in a user namespace), it requires a massive and complex gid_map setup for all those custom groups. The ZDTM sandbox initialization actually fails under these conditions (I encountered a mount(/proc) failed error when I tried running it in ns).

The ZDTM sandbox initialization actually fails under these conditions (I encountered a mount(/proc) failed error when I tried running it in ns).

Have you tried to investigate why it fails?

The issue I encountered was caused by a previous aborted run that left a .constructed marker file in the test/ directory, though I'm not sure why the run/ directory was lost. The script only checks for the .constructed to assume the sandbox is ready. I cleaned up my environment and the issue was resolved. I'm not sure if this edge case warrants a separate PR to improve the check.

By the way, 'flavor': 'h' is still necessary. This is because the test explicitly relies on setgroups() to attach hundreds of arbitrary GIDs.

As minimum, it should work with the ns flavor. For uns, you will need to extend uid/gid mappings.

I have updated the PR with a new commit to fully support the uns flavor.

To achieve this cleanly without polluting the global environment or breaking other tests, I added a new flag (ext-uid-map) in the .desc file. The Python framework detects this flag and injects a massive ID mapping (up to 4 billion) into the native test environment (ZDTM_UID/GID_MAP), which the C wrapper safely applies during namespace initialization.

Since the test now runs perfectly across all namespaces, I have removed the flavor restriction from the .desc file.

Copilot

Pull request overview

Adds support for dynamically expanding CRIU’s bfd read buffer when parsing unusually long text lines (e.g., /proc/<pid>/status Groups:) without permanently inflating the global fixed-size buffer pool, and introduces a ZDTM regression test that triggers this behavior.

Changes:

Track per-buffer capacity in criu/bfd.c and dynamically mmap() a larger buffer (up to 2MB) when breadchr() hits the current capacity.
Ensure oversized buffers are munmap()’d/xfree()’d instead of being returned to the fixed BUFSIZE pool.
Add a new ZDTM static test (massive_groups) to generate a long Groups: line and validate multiple consecutive expansions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`criu/bfd.c`	Implements dynamic buffer growth for long lines and ensures oversized buffers don’t pollute the fixed-size pool.
`test/zdtm/static/massive_groups.c`	New ZDTM test that sets many supplementary groups and validates they survive dump/restore.
`test/zdtm/static/massive_groups.desc`	Registers the test with required flags/flavor.
`test/zdtm/static/Makefile`	Adds the new test to the static test build lists/flags.

Comments suppressed due to low confidence (6)

criu/bfd.c:252

When returning the original standard buffer to the pool during expansion, this uses list_add_tail(). Combined with buf_get() consuming from the head, this changes reuse semantics compared to the previous LIFO behavior and may contradict the intent that a buffer is reused “by next bfdopen call”. Consider using list_add() here (and in buf_put()) to keep reuse order consistent.

					/*
					 * Don't unmap standard buffer back, it will get reused
					 * by next bfdopen call
					 */
					list_add_tail(&b_buf->l, &bufs);
				}

test/zdtm/static/massive_groups.c:17

The header comment is internally inconsistent/misleading: it says “set UID”, but this test only manipulates supplementary groups via setgroups(). Please reword this comment to match what the code actually does (and fix the grammar while you’re there).

/*
 * _SC_NGROUPS_MAX beyond parasite's maximum supported groups, so we use a hardcoded value.
 * Use 900 groups: enough to test the "Groups:" line in /proc/pid/status
 * and triggering dynamic expansion,
 * but still within the PARASITE_MAX_GROUPS limit (~900).
 * So we use the larger value to set UID and test the dynamic expansion of BFD buffer when dumping the process.
 */

test/zdtm/static/massive_groups.c:68

The fail() format string uses %d to print gid_t values. gid_t is not guaranteed to be int (often it’s unsigned int), so this can be undefined behavior or print incorrect values. Cast to a known type and use the matching format specifier (or avoid printing the raw gid_t directly).

		if (restored_groups[i] != group[i]) {
			fail("Restored group ID at index %d (%d) does not match expected (%d)", i, restored_groups[i], group[i]);
			free(restored_groups);

criu/bfd.c:29

BFD_MAX_MREMAP_SIZE is a misleading name here: the implementation uses mmap() + memcpy() rather than mremap(). Consider renaming this constant to something that reflects its purpose (e.g., max dynamic/oversized bfd buffer size) to avoid confusion for future maintainers.

#define BFD_MAX_MREMAP_SIZE (2 * 1024 * 1024)

criu/bfd.c:96

buf_put() now uses list_add_tail(), but the comment says the standard buffer will be reused by the next bfdopen call. Since buf_get() takes from the list head, adding to the tail makes reuse FIFO and the just-returned buffer likely won’t be the next one reused. Either switch back to list_add() (LIFO reuse) or adjust the comment and any expectations about reuse order.

		/*
		 * Don't unmap standard buffer back, it will get reused
		 * by next bfdopen call
		 */
		list_add_tail(&b->l, &bufs);
	}

criu/bfd.c:223

The hard-coded error text "Line too long to fit in 2M buffer" will become inaccurate if BFD_MAX_MREMAP_SIZE changes and also doesn’t report the actual limit encountered. Consider formatting the message using the limit constant (and ideally the offending line length/cap) so logs stay actionable.

			if (new_cap > BFD_MAX_MREMAP_SIZE) {
				pr_err("Line too long to fit in 2M buffer\n");
				return ERR_PTR(-EIO);

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

test/zdtm/static/massive_groups.c:86

fail() prints gid_t values using %u, but gid_t is not guaranteed to be unsigned int on all architectures/libc configurations. This can trigger -Wformat warnings or incorrect output. Consider casting to a known type (e.g., (unsigned int)) or using a format specifier that matches gid_t consistently.

		if (restored_groups[i] != group[i]) {
			fail("Restored group ID at index %d (%u) does not match expected (%u)", i, restored_groups[i], group[i]);
			free(restored_groups);

criu/bfd.c:245

The buffer-release logic here (unmap+free for oversized buffers vs returning to the pool for standard buffers) is duplicated with buf_put(). To reduce the chance of future inconsistencies, consider extracting a small helper that releases a struct bfd_buf * and use it from both breadchr() and buf_put().

			if (b_buf) {
				if (b_buf->size > BUFSIZE) {
					/* This buffer was remapped to fit a long line, unmap it back */
					munmap(b_buf->mem, b_buf->size);
					xfree(b_buf);

codecov-commenter · 2026-03-24T02:11:43Z

Codecov Report

❌ Patch coverage is 89.74359% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.20%. Comparing base (b7f6b72) to head (8d760bf).
⚠️ Report is 687 commits behind head on criu-dev.

Files with missing lines	Patch %	Lines
criu/bfd.c	77.77%	8 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           criu-dev    #2900      +/-   ##
============================================
- Coverage     57.76%   57.20%   -0.57%     
============================================
  Files           142      154      +12     
  Lines         37664    40398    +2734     
  Branches          0     8856    +8856     
============================================
+ Hits          21758    23111    +1353     
- Misses        15906    17023    +1117     
- Partials          0      264     +264

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

avagin · 2026-03-31T18:09:09Z

@DongSunchao what do you think about this draft?

diff --git a/criu/bfd.c b/criu/bfd.c
index 2c5f5b64a..2e44acd03 100644
--- a/criu/bfd.c
+++ b/criu/bfd.c
@@ -70,6 +70,7 @@ static int buf_get(struct xbuf *xb)
 
 	xb->mem = b->mem;
 	xb->data = xb->mem;
+	xb->bsize = BUFSIZE;
 	xb->sz = 0;
 	xb->buf = b;
 	return 0;
@@ -77,11 +78,15 @@ static int buf_get(struct xbuf *xb)
 
 static void buf_put(struct xbuf *xb)
 {
-	/*
-	 * Don't unmap buffer back, it will get reused
-	 * by next bfdopen call
-	 */
-	list_add(&xb->buf->l, &bufs);
+	if (xb->buf) {
+		/*
+		 * Don't unmap buffer back, it will get reused
+		 * by next bfdopen call
+		 */
+		list_add(&xb->buf->l, &bufs);
+	} else {
+		munmap(xb->mem, xb->bsize);
+	}
 	xb->buf = NULL;
 	xb->mem = NULL;
 	xb->data = NULL;
@@ -144,7 +149,7 @@ static int brefill(struct bfd *f)
 	memmove(b->mem, b->data, b->sz);
 	b->data = b->mem;
 
-	ret = read_all(f->fd, b->mem + b->sz, BUFSIZE - b->sz);
+	ret = read_all(f->fd, b->mem + b->sz, b->bsize - b->sz);
 	if (ret < 0) {
 		pr_perror("Error reading file");
 		return -1;
@@ -172,6 +177,30 @@ char *breadline(struct bfd *f)
 	return breadchr(f, '\n');
 }
 
+static int bextend(struct bfd *f)
+{
+	struct xbuf *b = &f->b;
+	void *newbuf;
+	long newsize = b->bsize * 2;
+
+	if (b->buf) {
+		newbuf = mmap(NULL, newsize, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+		if (newbuf == MAP_FAILED)
+			return -1;
+		memcpy(newbuf, b->mem, b->sz);
+		list_add(&b->buf->l, &bufs);
+		b->buf = NULL;
+	} else {
+		newbuf = mremap(b->mem, b->bsize, newsize, MREMAP_MAYMOVE);
+		if (newbuf == MAP_FAILED)
+			return -1;
+	}
+	b->mem = newbuf;
+	b->data = newbuf;
+	b->bsize = newsize;
+	return 0;
+}
+
 char *breadchr(struct bfd *f, char c)
 {
 	struct xbuf *b = &f->b;
@@ -195,9 +224,12 @@ again:
 		if (!b->sz)
 			return NULL;
 
-		if (b->sz == BUFSIZE) {
-			pr_err("The bfd buffer is too small\n");
-			return ERR_PTR(-EIO);
+		if (b->sz == b->bsize) {
+			if (bextend(f)) {
+				pr_err("The bfd buffer is too small\n");
+				return ERR_PTR(-EIO);
+			}
+			goto refill;
 		}
 		/*
 		 * Last bytes may lack the \n at the
@@ -216,6 +248,7 @@ again:
 		return b->data;
 	}
 
+refill:
 	/*
 	 * small optimization -- we've scanned b->sz
 	 * symbols already, no need to re-scan them after
@@ -252,14 +285,14 @@ static int __bwrite(struct bfd *bfd, const void *buf, int size)
 {
 	struct xbuf *b = &bfd->b;
 
-	if (b->sz + size > BUFSIZE) {
+	if (b->sz + size > b->bsize) {
 		int ret;
 		ret = bflush(bfd);
 		if (ret < 0)
 			return ret;
 	}
 
-	if (size > BUFSIZE)
+	if (size > b->bsize)
 		return write_all(bfd->fd, buf, size);
 
 	memcpy(b->data + b->sz, buf, size);
diff --git a/criu/include/bfd.h b/criu/include/bfd.h
index 2846ec628..050158327 100644
--- a/criu/include/bfd.h
+++ b/criu/include/bfd.h
@@ -8,6 +8,7 @@ struct xbuf {
 	char *mem;	 /* buffer */
 	char *data;	 /* position we see bytes at */
 	unsigned int sz; /* bytes sitting after b->pos */
+	unsigned int bsize;
 	struct bfd_buf *buf;
 };

Currently, bfd has a fixed buffer size (BUFSIZE, which is 4096). This causes issues when reading lines longer than BUFSIZE, as breadline fails with "The bfd buffer is too small". This patch introduces dynamic buffer resizing in bfd. When a buffer is full and more space is needed (e.g., for a very long line), the buffer is resized using mremap (or a new mmap if it was using a pre-allocated buffer from the pool). A new bsize field is added to struct xbuf to keep track of the current buffer size. Unit tests for reading long lines and writing large buffers are added to criu/unittest/unit.c. Signed-off-by: dong sunchao <dongsunchao@gmail.com> Co-developed-by: Andrei Vagin <avagin@google.com> Signed-off-by: Andrei Vagin <avagin@google.com>

Add a new ZDTM test case `massive_groups` to verify the dynamic buffer expansion logic in `bfd.c` Signed-off-by: dong sunchao <dongsunchao@gmail.com>

Tests that use supplementary group IDs beyond the default mapping range (e.g. GIDs in the billion scale) fail with EINVAL in user namespace mode because the GIDs are not mapped. Add support for the 'ext-uid-map' test flag: when set, ZDTM_UID_MAP and ZDTM_GID_MAP are populated with a full 32-bit identity mapping (0 0 4294967295), allowing the child user namespace to access the full UID/GID space. Signed-off-by: DongSunchao <dongsunchao@gmail.com>

github-actions · 2026-05-04T00:25:54Z

A friendly reminder that this PR had no activity for 30 days.

DongSunchao force-pushed the user-group branch 2 times, most recently from ab06539 to a81257f Compare February 20, 2026 09:48

Snorch reviewed Feb 20, 2026

View reviewed changes

Comment thread criu/bfd.c

DongSunchao force-pushed the user-group branch from a81257f to 3c3502b Compare February 20, 2026 11:08

DongSunchao requested a review from Snorch February 20, 2026 13:50

Snorch approved these changes Feb 20, 2026

View reviewed changes

Comment thread criu/bfd.c Outdated

Comment thread criu/bfd.c

Comment thread test/zdtm/static/massive_groups.c Outdated

avagin reviewed Feb 20, 2026

View reviewed changes

Comment thread criu/bfd.c Outdated

DongSunchao force-pushed the user-group branch 5 times, most recently from 9cf3a53 to 3a7c9b2 Compare February 21, 2026 04:17

DongSunchao requested a review from avagin February 21, 2026 14:50

avagin reviewed Feb 22, 2026

View reviewed changes

Comment thread criu/bfd.c Outdated

avagin reviewed Feb 22, 2026

View reviewed changes

Comment thread criu/bfd.c Outdated

avagin reviewed Feb 22, 2026

View reviewed changes

Comment thread test/zdtm/static/massive_groups.c Outdated

avagin reviewed Feb 22, 2026

View reviewed changes

Comment thread test/zdtm/static/massive_groups.c

DongSunchao force-pushed the user-group branch from 3a7c9b2 to 4570c0c Compare February 22, 2026 04:49

DongSunchao requested a review from avagin February 22, 2026 05:33

avagin requested a review from Copilot March 5, 2026 07:53

Copilot started reviewing on behalf of avagin March 5, 2026 07:54 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Comment thread test/zdtm/static/massive_groups.c

DongSunchao force-pushed the user-group branch from 4570c0c to c068285 Compare March 5, 2026 11:58

DongSunchao requested a review from Copilot March 5, 2026 12:02

Copilot started reviewing on behalf of DongSunchao March 5, 2026 12:03 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

DongSunchao force-pushed the user-group branch from c068285 to 9f16a1c Compare March 5, 2026 12:15

DongSunchao force-pushed the user-group branch from c857e10 to 9f16a1c Compare March 24, 2026 01:06

avagin reviewed Mar 30, 2026

View reviewed changes

Comment thread criu/bfd.c Outdated

DongSunchao force-pushed the user-group branch from 9f16a1c to 5e58d90 Compare March 30, 2026 23:22

DongSunchao requested review from avagin and Copilot March 30, 2026 23:23

Copilot started reviewing on behalf of DongSunchao March 30, 2026 23:24 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

Comment thread criu/bfd.c

Comment thread criu/bfd.c Outdated

DongSunchao force-pushed the user-group branch 2 times, most recently from fce7031 to 175da22 Compare March 31, 2026 00:08

avagin reviewed Mar 31, 2026

View reviewed changes

Comment thread criu/bfd.c Outdated

DongSunchao force-pushed the user-group branch from 175da22 to 9486da9 Compare April 1, 2026 01:17

DongSunchao requested a review from avagin April 1, 2026 01:17

avagin force-pushed the user-group branch from 9486da9 to bfb8599 Compare April 1, 2026 16:42

DongSunchao force-pushed the user-group branch from 403ea61 to fe5c613 Compare April 3, 2026 00:55

DongSunchao added 2 commits April 3, 2026 12:11

test/zdtm: Add ZDTM test for dynamic bfd expansion on massive group

2674f24

Add a new ZDTM test case `massive_groups` to verify the dynamic buffer expansion logic in `bfd.c` Signed-off-by: dong sunchao <dongsunchao@gmail.com>

DongSunchao force-pushed the user-group branch from fe5c613 to 8d760bf Compare April 3, 2026 01:13

github-actions Bot added the stale-pr label May 4, 2026

		@@ -0,0 +1 @@
		{'flavor': 'h', 'flags': 'suid'} No newline at end of file

Conversation

DongSunchao commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Snorch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

avagin Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

DongSunchao Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

avagin Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

DongSunchao Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

avagin Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

DongSunchao Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

codecov-commenter commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

avagin commented Mar 31, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

DongSunchao commented Feb 20, 2026 •

edited

Loading

codecov-commenter commented Mar 24, 2026 •

edited

Loading