Skip to content

bfd: Support dynamic expansion for massively long text lines#2900

Open
DongSunchao wants to merge 2 commits intocheckpoint-restore:criu-devfrom
DongSunchao:user-group
Open

bfd: Support dynamic expansion for massively long text lines#2900
DongSunchao wants to merge 2 commits intocheckpoint-restore:criu-devfrom
DongSunchao:user-group

Conversation

@DongSunchao
Copy link
Copy Markdown

@DongSunchao DongSunchao commented Feb 20, 2026

Fixes #2898

Motivation
Currently, breadchr() aborts with "The bfd buffer is too small" when encountering a process with a massive number of supplementary groups, because the Groups: line in /proc/<pid>/status exceeds the hardcoded BUFSIZE (4096 bytes).

While increasing BUFSIZE globally (e.g., to PAGE_SIZE * 16) works as a quick workaround, it is not optimal. Due to Linux's demand paging, while this only reserves virtual memory initially, recycling a dirtied, globally enlarged buffer back into the standard pool via list_add_tail() causes severe internal fragmentation (RSS bloat). Those physical pages would remain resident in memory indefinitely even when reused for tiny files.

Implementation Details
This PR implements a strict dynamic expansion mechanism in bfd.c:

  • Keeps the default zero-copy batched mmap pool (BUFSIZE) intact for 99% of normal routine operations.
  • When breadchr() hits the buffer boundary, it dynamically maps an independent, doubled VMA to handle the long text line (capped at 2MB to prevent OOM).
  • Explicitly intercepts and safely unmaps (munmap) these oversized custom buffers in both buf_put() and during consecutive re-expansions in breadchr(), strictly preventing them from polluting the fixed-size 4KB memory pool.
  • Included a new ZDTM test (massive_groups) using 900 groups with 10-digit GIDs (e.g., 1000000000). This generates a ~10KB Groups: line, intentionally forcing multiple consecutive dynamic expansions (4K -> 8K -> 16K) to thoroughly validate the expansion and safe-discard logic, while safely staying below the PARASITE_MAX_GROUPS limit.

Tested locally with make zdtm and successfully passed the massive_groups test in the host namespace.

@DongSunchao DongSunchao force-pushed the user-group branch 2 times, most recently from ab06539 to a81257f Compare February 20, 2026 09:48
Copy link
Copy Markdown
Member

@Snorch Snorch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added small improvement comments, please take a look. But generally this looks good to me.

@DongSunchao DongSunchao force-pushed the user-group branch 5 times, most recently from 9cf3a53 to 3a7c9b2 Compare February 21, 2026 04:17
@DongSunchao DongSunchao requested a review from avagin February 21, 2026 14:50
@@ -0,0 +1 @@
{'flavor': 'h', 'flags': 'suid'} No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it running only in the host namespace?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that this test will call setgroups() with 900, extremely large GIDs.
If we run it without 'flavor': 'h' (in a user namespace), it requires a massive and complex gid_map setup for all those custom groups. The ZDTM sandbox initialization actually fails under these conditions (I encountered a mount(/proc) failed error when I tried running it in ns).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ZDTM sandbox initialization actually fails under these conditions (I encountered a mount(/proc) failed error when I tried running it in ns).

Have you tried to investigate why it fails?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for dynamically expanding CRIU’s bfd read buffer when parsing unusually long text lines (e.g., /proc/<pid>/status Groups:) without permanently inflating the global fixed-size buffer pool, and introduces a ZDTM regression test that triggers this behavior.

Changes:

  • Track per-buffer capacity in criu/bfd.c and dynamically mmap() a larger buffer (up to 2MB) when breadchr() hits the current capacity.
  • Ensure oversized buffers are munmap()’d/xfree()’d instead of being returned to the fixed BUFSIZE pool.
  • Add a new ZDTM static test (massive_groups) to generate a long Groups: line and validate multiple consecutive expansions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
criu/bfd.c Implements dynamic buffer growth for long lines and ensures oversized buffers don’t pollute the fixed-size pool.
test/zdtm/static/massive_groups.c New ZDTM test that sets many supplementary groups and validates they survive dump/restore.
test/zdtm/static/massive_groups.desc Registers the test with required flags/flavor.
test/zdtm/static/Makefile Adds the new test to the static test build lists/flags.
Comments suppressed due to low confidence (6)

criu/bfd.c:252

  • When returning the original standard buffer to the pool during expansion, this uses list_add_tail(). Combined with buf_get() consuming from the head, this changes reuse semantics compared to the previous LIFO behavior and may contradict the intent that a buffer is reused “by next bfdopen call”. Consider using list_add() here (and in buf_put()) to keep reuse order consistent.
					/*
					 * Don't unmap standard buffer back, it will get reused
					 * by next bfdopen call
					 */
					list_add_tail(&b_buf->l, &bufs);
				}

test/zdtm/static/massive_groups.c:17

  • The header comment is internally inconsistent/misleading: it says “set UID”, but this test only manipulates supplementary groups via setgroups(). Please reword this comment to match what the code actually does (and fix the grammar while you’re there).
/*
 * _SC_NGROUPS_MAX beyond parasite's maximum supported groups, so we use a hardcoded value.
 * Use 900 groups: enough to test the "Groups:" line in /proc/pid/status
 * and triggering dynamic expansion,
 * but still within the PARASITE_MAX_GROUPS limit (~900).
 * So we use the larger value to set UID and test the dynamic expansion of BFD buffer when dumping the process.
 */

test/zdtm/static/massive_groups.c:68

  • The fail() format string uses %d to print gid_t values. gid_t is not guaranteed to be int (often it’s unsigned int), so this can be undefined behavior or print incorrect values. Cast to a known type and use the matching format specifier (or avoid printing the raw gid_t directly).
		if (restored_groups[i] != group[i]) {
			fail("Restored group ID at index %d (%d) does not match expected (%d)", i, restored_groups[i], group[i]);
			free(restored_groups);

criu/bfd.c:29

  • BFD_MAX_MREMAP_SIZE is a misleading name here: the implementation uses mmap() + memcpy() rather than mremap(). Consider renaming this constant to something that reflects its purpose (e.g., max dynamic/oversized bfd buffer size) to avoid confusion for future maintainers.
#define BFD_MAX_MREMAP_SIZE (2 * 1024 * 1024)

criu/bfd.c:96

  • buf_put() now uses list_add_tail(), but the comment says the standard buffer will be reused by the next bfdopen call. Since buf_get() takes from the list head, adding to the tail makes reuse FIFO and the just-returned buffer likely won’t be the next one reused. Either switch back to list_add() (LIFO reuse) or adjust the comment and any expectations about reuse order.
		/*
		 * Don't unmap standard buffer back, it will get reused
		 * by next bfdopen call
		 */
		list_add_tail(&b->l, &bufs);
	}

criu/bfd.c:223

  • The hard-coded error text "Line too long to fit in 2M buffer" will become inaccurate if BFD_MAX_MREMAP_SIZE changes and also doesn’t report the actual limit encountered. Consider formatting the message using the limit constant (and ideally the offending line length/cap) so logs stay actionable.
			if (new_cap > BFD_MAX_MREMAP_SIZE) {
				pr_err("Line too long to fit in 2M buffer\n");
				return ERR_PTR(-EIO);

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

test/zdtm/static/massive_groups.c:86

  • fail() prints gid_t values using %u, but gid_t is not guaranteed to be unsigned int on all architectures/libc configurations. This can trigger -Wformat warnings or incorrect output. Consider casting to a known type (e.g., (unsigned int)) or using a format specifier that matches gid_t consistently.
		if (restored_groups[i] != group[i]) {
			fail("Restored group ID at index %d (%u) does not match expected (%u)", i, restored_groups[i], group[i]);
			free(restored_groups);

criu/bfd.c:245

  • The buffer-release logic here (unmap+free for oversized buffers vs returning to the pool for standard buffers) is duplicated with buf_put(). To reduce the chance of future inconsistencies, consider extracting a small helper that releases a struct bfd_buf * and use it from both breadchr() and buf_put().
			if (b_buf) {
				if (b_buf->size > BUFSIZE) {
					/* This buffer was remapped to fit a long line, unmap it back */
					munmap(b_buf->mem, b_buf->size);
					xfree(b_buf);

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 89.74359% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.19%. Comparing base (b7f6b72) to head (bfb8599).
⚠️ Report is 686 commits behind head on criu-dev.

Files with missing lines Patch % Lines
criu/bfd.c 77.77% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           criu-dev    #2900      +/-   ##
============================================
- Coverage     57.76%   57.19%   -0.58%     
============================================
  Files           142      154      +12     
  Lines         37664    40377    +2713     
  Branches          0     8851    +8851     
============================================
+ Hits          21758    23095    +1337     
- Misses        15906    17018    +1112     
- Partials          0      264     +264     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

@DongSunchao DongSunchao force-pushed the user-group branch 2 times, most recently from fce7031 to 175da22 Compare March 31, 2026 00:08
@avagin
Copy link
Copy Markdown
Member

avagin commented Mar 31, 2026

@DongSunchao what do you think about this draft?

diff --git a/criu/bfd.c b/criu/bfd.c
index 2c5f5b64a..2e44acd03 100644
--- a/criu/bfd.c
+++ b/criu/bfd.c
@@ -70,6 +70,7 @@ static int buf_get(struct xbuf *xb)
 
 	xb->mem = b->mem;
 	xb->data = xb->mem;
+	xb->bsize = BUFSIZE;
 	xb->sz = 0;
 	xb->buf = b;
 	return 0;
@@ -77,11 +78,15 @@ static int buf_get(struct xbuf *xb)
 
 static void buf_put(struct xbuf *xb)
 {
-	/*
-	 * Don't unmap buffer back, it will get reused
-	 * by next bfdopen call
-	 */
-	list_add(&xb->buf->l, &bufs);
+	if (xb->buf) {
+		/*
+		 * Don't unmap buffer back, it will get reused
+		 * by next bfdopen call
+		 */
+		list_add(&xb->buf->l, &bufs);
+	} else {
+		munmap(xb->mem, xb->bsize);
+	}
 	xb->buf = NULL;
 	xb->mem = NULL;
 	xb->data = NULL;
@@ -144,7 +149,7 @@ static int brefill(struct bfd *f)
 	memmove(b->mem, b->data, b->sz);
 	b->data = b->mem;
 
-	ret = read_all(f->fd, b->mem + b->sz, BUFSIZE - b->sz);
+	ret = read_all(f->fd, b->mem + b->sz, b->bsize - b->sz);
 	if (ret < 0) {
 		pr_perror("Error reading file");
 		return -1;
@@ -172,6 +177,30 @@ char *breadline(struct bfd *f)
 	return breadchr(f, '\n');
 }
 
+static int bextend(struct bfd *f)
+{
+	struct xbuf *b = &f->b;
+	void *newbuf;
+	long newsize = b->bsize * 2;
+
+	if (b->buf) {
+		newbuf = mmap(NULL, newsize, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+		if (newbuf == MAP_FAILED)
+			return -1;
+		memcpy(newbuf, b->mem, b->sz);
+		list_add(&b->buf->l, &bufs);
+		b->buf = NULL;
+	} else {
+		newbuf = mremap(b->mem, b->bsize, newsize, MREMAP_MAYMOVE);
+		if (newbuf == MAP_FAILED)
+			return -1;
+	}
+	b->mem = newbuf;
+	b->data = newbuf;
+	b->bsize = newsize;
+	return 0;
+}
+
 char *breadchr(struct bfd *f, char c)
 {
 	struct xbuf *b = &f->b;
@@ -195,9 +224,12 @@ again:
 		if (!b->sz)
 			return NULL;
 
-		if (b->sz == BUFSIZE) {
-			pr_err("The bfd buffer is too small\n");
-			return ERR_PTR(-EIO);
+		if (b->sz == b->bsize) {
+			if (bextend(f)) {
+				pr_err("The bfd buffer is too small\n");
+				return ERR_PTR(-EIO);
+			}
+			goto refill;
 		}
 		/*
 		 * Last bytes may lack the \n at the
@@ -216,6 +248,7 @@ again:
 		return b->data;
 	}
 
+refill:
 	/*
 	 * small optimization -- we've scanned b->sz
 	 * symbols already, no need to re-scan them after
@@ -252,14 +285,14 @@ static int __bwrite(struct bfd *bfd, const void *buf, int size)
 {
 	struct xbuf *b = &bfd->b;
 
-	if (b->sz + size > BUFSIZE) {
+	if (b->sz + size > b->bsize) {
 		int ret;
 		ret = bflush(bfd);
 		if (ret < 0)
 			return ret;
 	}
 
-	if (size > BUFSIZE)
+	if (size > b->bsize)
 		return write_all(bfd->fd, buf, size);
 
 	memcpy(b->data + b->sz, buf, size);
diff --git a/criu/include/bfd.h b/criu/include/bfd.h
index 2846ec628..050158327 100644
--- a/criu/include/bfd.h
+++ b/criu/include/bfd.h
@@ -8,6 +8,7 @@ struct xbuf {
 	char *mem;	 /* buffer */
 	char *data;	 /* position we see bytes at */
 	unsigned int sz; /* bytes sitting after b->pos */
+	unsigned int bsize;
 	struct bfd_buf *buf;
 };
 

Currently, bfd has a fixed buffer size (BUFSIZE, which is 4096). This
causes issues when reading lines longer than BUFSIZE, as breadline
fails with "The bfd buffer is too small".

This patch introduces dynamic buffer resizing in bfd. When a buffer
is full and more space is needed (e.g., for a very long line), the
buffer is resized using mremap (or a new mmap if it was using a
pre-allocated buffer from the pool).

A new bsize field is added to struct xbuf to keep track of the
current buffer size.

Unit tests for reading long lines and writing large buffers are
added to criu/unittest/unit.c.

Signed-off-by: dong sunchao <[email protected]>
Co-developed-by: Andrei Vagin <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
Add a new ZDTM test case `massive_groups` to verify the dynamic buffer
expansion logic in `bfd.c`

Signed-off-by: dong sunchao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot dump from user added in too many groups

5 participants