Add concurrent write-handle limit returning ENOMEM on open() by yerzhan7 · Pull Request #1831 · awslabs/mountpoint-s3

yerzhan7 · 2026-05-14T10:11:26Z

Introduce WriteHandleLimiter, an admission-control type that caps the number
of files open for write at the same time. The cap is derived from the
configured memory target and write part size:

max_concurrent_writes = (memory_target - additional_mem_reserved) / write_part_size

When the cap is reached, open() for write returns ENOMEM before any inode
state is mutated, so retries are clean. The reserved slot is held by the
returned WriteHandleSlot; its Drop releases the slot when the file handle is
closed.

The limiter is enforced inside Metablock::open_handle so the slot acquire
happens before start_writing mutates the inode.

Note on ordering: the write-handle limiter check runs before the inode-conflict
check inside open_handle. This is intentional — the limiter check is a lock-free
atomic and produces no state to roll back, so it's the natural fail-fast point.
The trade-off is that when both errors apply (cap exhausted AND same file
already open for write), the user sees ENOMEM rather than the more specific
EPERM. We accept this; it's a rare double-failure case and the user retries
either way.

Does this change impact existing behavior?

Yes.

Does this change need a changelog entry? Does it require a version change?

Yes. Intentionally skipped changelog entry and version bumps for now to avoid merge conflicts and because we are working on a feature branch.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

Userspace close() returns before the kernel has dispatched the FUSE RELEASE op to Mountpoint, so the WriteHandleSlot is briefly held after the file is dropped. Poll the retry open() with a short timeout instead of expecting it to succeed on the first try, which races on slow CI hosts. Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

passaro · 2026-05-15T12:46:30Z

        fh: u64,
        write_mode: &WriteMode,
        flags: OpenFlags,
+        write_handle_limiter: Option<&Arc<WriteHandleLimiter>>,


nit: could we avoid the reference to Arc? e.g. by the limiter holding an Arc<AtomicU64> internally?

Yes, thanks!

passaro · 2026-05-15T12:50:09Z


 #[cfg(feature = "s3_tests")]
-#[test_case(200)]
+#[test_case(48)]


Consider explicitly setting the memory limit in this test, so we can make clear that we want to test the maximum number of writers.

Done, thanks. Also, added back the 200+ test case to ensure no regression.

passaro · 2026-05-15T12:51:46Z

 mod superblock;
 mod sync;
 pub mod upload;
+pub mod write_handle_limiter;


Consider moving under memory or fs.

passaro · 2026-05-15T12:55:22Z

+            .bucket(bucket.to_string())
+            .enable_backpressure(true)
+            .initial_read_window_size(1024 * 1024)
+            .part_size(32 * 1024 * 1024)


Could we define this and other values as constants at the top of this function? And move the calculation described in the rustdoc there.

passaro · 2026-05-15T13:07:11Z

+    /// for read handles. Released automatically when the `FileHandle` is dropped — held purely
+    /// for that `Drop` side effect, so the field is never read directly.
+    #[expect(dead_code, reason = "held for its Drop side effect")]
+    pub(super) write_slot: Option<WriteHandleSlot>,


Shouldn't this be in FileHandleState::Write?

Also, prefer _write_slot to the dead code expect.

Good suggestion - done.

passaro · 2026-05-15T13:22:51Z

+            .metablock
+            .open_handle(ino, fh, &write_mode, flags, Some(&self.write_handle_limiter))
+            .await?;
+        let write_slot = new_handle.write_slot.take();


Why take() and mut new_handle? We don't actually want to mutate anything.

If this was just to "fix" lifetimes, consider instead unpacking or copying the data you will need later.

Make sense, done.

passaro · 2026-05-15T13:43:12Z


+### Maximum number of files open for write
+
+Mountpoint enforces a cap on the number of files that may be open for write at the same time, to prevent out-of-memory crashes. The cap is computed at startup from the configured memory target and write part size:


Can we say "to control memory usage" instead of oom crashes?

We should start mentioning what's the default for memory target (and thus max writes).

Done. Added mentioning of the default value.

passaro · 2026-05-15T13:54:43Z


 | Metric | Type | Dimensions | Description |
 |--------|------|------------|-------------|
+| `fs.write_handle_limit_exceeded` | Counter | | Number of `open()` calls for write rejected because the [concurrent-writers cap](CONFIGURATION.md#maximum-number-of-files-open-for-write) was reached |


Before documenting the new metric, we should add it to metrics::defs. Or we can leave it for a later review of all new metrics.

Didn't know about the defs file.

Removed from docs for now - we can review all new metrics later as you've suggested.

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

Pulls in awslabs#1832 (CursorState consolidation). The conflict in mountpoint-s3-fs/src/memory/pool.rs was in the delegation block: upstream removed reserve/try_reserve/release_cursor/next_cursor_id/inner_stats/ inner_limiter in favor of a unified `create_cursor`, while this branch had added `mem_limit`/`data_buffer_budget` accessors used by `WriteHandleLimiter`. Resolved by keeping both — upstream's `create_cursor` plus our two accessors — and dropped the now-unused test-only `inner_stats`/`inner_limiter` helpers along with their TODO. Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

Add concurrent write-handle limit returning ENOMEM on open()

d9a2542

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

yerzhan7 temporarily deployed to PR integration tests May 14, 2026 10:11 — with GitHub Actions Inactive

Add fuse test case and fix concurrent test case

2b51753

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

yerzhan7 requested a deployment to PR integration tests May 14, 2026 12:27 — with GitHub Actions Waiting

tests: apply rustfmt to ENOMEM e2e test

fb1f1f5

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

yerzhan7 temporarily deployed to PR integration tests May 14, 2026 12:41 — with GitHub Actions Inactive

yerzhan7 requested a deployment to PR integration tests May 14, 2026 13:29 — with GitHub Actions Waiting

tests: simplify retried-open binding in ENOMEM e2e test

2a28c84

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

yerzhan7 temporarily deployed to PR integration tests May 14, 2026 13:32 — with GitHub Actions Inactive

passaro reviewed May 15, 2026

View reviewed changes

Address comments

9b64dc2

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

yerzhan7 requested a deployment to PR integration tests May 15, 2026 15:14 — with GitHub Actions Waiting

Log write-handle cap at startup; skip limiter on read-only mounts

cae2f1c

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

yerzhan7 requested a deployment to PR integration tests May 15, 2026 17:00 — with GitHub Actions Waiting

yerzhan7 temporarily deployed to PR integration tests May 15, 2026 17:11 — with GitHub Actions Inactive

Fix typo

7595310

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

yerzhan7 requested a deployment to PR integration tests May 15, 2026 17:21 — with GitHub Actions Waiting

Fix typo

6a1ca1b

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>

yerzhan7 temporarily deployed to PR integration tests May 15, 2026 17:23 — with GitHub Actions Inactive


		### Maximum number of files open for write

		Mountpoint enforces a cap on the number of files that may be open for write at the same time, to prevent out-of-memory crashes. The cap is computed at startup from the configured memory target and write part size:

Conversation

yerzhan7 commented May 14, 2026

Does this change impact existing behavior?

Does this change need a changelog entry? Does it require a version change?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants