Skip to content

Add concurrent write-handle limit returning ENOMEM on open()#1831

Open
yerzhan7 wants to merge 10 commits into
awslabs:feature/memory-limitfrom
yerzhan7:feature/enomem
Open

Add concurrent write-handle limit returning ENOMEM on open()#1831
yerzhan7 wants to merge 10 commits into
awslabs:feature/memory-limitfrom
yerzhan7:feature/enomem

Conversation

@yerzhan7
Copy link
Copy Markdown
Contributor

Introduce WriteHandleLimiter, an admission-control type that caps the number
of files open for write at the same time. The cap is derived from the
configured memory target and write part size:

max_concurrent_writes = (memory_target - additional_mem_reserved) / write_part_size

When the cap is reached, open() for write returns ENOMEM before any inode
state is mutated, so retries are clean. The reserved slot is held by the
returned WriteHandleSlot; its Drop releases the slot when the file handle is
closed.

The limiter is enforced inside Metablock::open_handle so the slot acquire
happens before start_writing mutates the inode.

Note on ordering: the write-handle limiter check runs before the inode-conflict
check inside open_handle. This is intentional — the limiter check is a lock-free
atomic and produces no state to roll back, so it's the natural fail-fast point.
The trade-off is that when both errors apply (cap exhausted AND same file
already open for write), the user sees ENOMEM rather than the more specific
EPERM. We accept this; it's a rare double-failure case and the user retries
either way.

Does this change impact existing behavior?

Yes.

Does this change need a changelog entry? Does it require a version change?

Yes. Intentionally skipped changelog entry and version bumps for now to avoid merge conflicts and because we are working on a feature branch.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 temporarily deployed to PR integration tests May 14, 2026 10:11 — with GitHub Actions Inactive
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 requested a deployment to PR integration tests May 14, 2026 12:27 — with GitHub Actions Waiting
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 temporarily deployed to PR integration tests May 14, 2026 12:41 — with GitHub Actions Inactive
Userspace close() returns before the kernel has dispatched the FUSE RELEASE op to Mountpoint, so the WriteHandleSlot is briefly held after the file is dropped. Poll the retry open() with a short timeout instead of expecting it to succeed on the first try, which races on slow CI hosts.

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 requested a deployment to PR integration tests May 14, 2026 13:29 — with GitHub Actions Waiting
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 temporarily deployed to PR integration tests May 14, 2026 13:32 — with GitHub Actions Inactive
Comment thread mountpoint-s3-fs/src/superblock.rs Outdated
fh: u64,
write_mode: &WriteMode,
flags: OpenFlags,
write_handle_limiter: Option<&Arc<WriteHandleLimiter>>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we avoid the reference to Arc? e.g. by the limiter holding an Arc<AtomicU64> internally?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks!

Comment thread mountpoint-s3-fs/src/write_handle_limiter.rs Outdated

#[cfg(feature = "s3_tests")]
#[test_case(200)]
#[test_case(48)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider explicitly setting the memory limit in this test, so we can make clear that we want to test the maximum number of writers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks. Also, added back the 200+ test case to ensure no regression.

Comment thread mountpoint-s3-fs/src/lib.rs Outdated
mod superblock;
mod sync;
pub mod upload;
pub mod write_handle_limiter;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving under memory or fs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread mountpoint-s3-fs/src/fs.rs Outdated
.bucket(bucket.to_string())
.enable_backpressure(true)
.initial_read_window_size(1024 * 1024)
.part_size(32 * 1024 * 1024)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we define this and other values as constants at the top of this function? And move the calculation described in the rustdoc there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread mountpoint-s3-fs/src/fs/handles.rs Outdated
/// for read handles. Released automatically when the `FileHandle` is dropped — held purely
/// for that `Drop` side effect, so the field is never read directly.
#[expect(dead_code, reason = "held for its Drop side effect")]
pub(super) write_slot: Option<WriteHandleSlot>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in FileHandleState::Write?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, prefer _write_slot to the dead code expect.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion - done.

Comment thread mountpoint-s3-fs/src/fs.rs Outdated
.metablock
.open_handle(ino, fh, &write_mode, flags, Some(&self.write_handle_limiter))
.await?;
let write_slot = new_handle.write_slot.take();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why take() and mut new_handle? We don't actually want to mutate anything.

If this was just to "fix" lifetimes, consider instead unpacking or copying the data you will need later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, done.

Comment thread doc/CONFIGURATION.md Outdated

### Maximum number of files open for write

Mountpoint enforces a cap on the number of files that may be open for write at the same time, to prevent out-of-memory crashes. The cap is computed at startup from the configured memory target and write part size:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say "to control memory usage" instead of oom crashes?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should start mentioning what's the default for memory target (and thus max writes).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added mentioning of the default value.

Comment thread doc/METRICS.md Outdated

| Metric | Type | Dimensions | Description |
|--------|------|------------|-------------|
| `fs.write_handle_limit_exceeded` | Counter | | Number of `open()` calls for write rejected because the [concurrent-writers cap](CONFIGURATION.md#maximum-number-of-files-open-for-write) was reached |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before documenting the new metric, we should add it to metrics::defs. Or we can leave it for a later review of all new metrics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about the defs file.

Removed from docs for now - we can review all new metrics later as you've suggested.

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 requested a deployment to PR integration tests May 15, 2026 15:14 — with GitHub Actions Waiting
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 requested a deployment to PR integration tests May 15, 2026 17:00 — with GitHub Actions Waiting
Pulls in awslabs#1832 (CursorState consolidation). The conflict in
mountpoint-s3-fs/src/memory/pool.rs was in the delegation block: upstream
removed reserve/try_reserve/release_cursor/next_cursor_id/inner_stats/
inner_limiter in favor of a unified `create_cursor`, while this branch had
added `mem_limit`/`data_buffer_budget` accessors used by `WriteHandleLimiter`.
Resolved by keeping both — upstream's `create_cursor` plus our two
accessors — and dropped the now-unused test-only `inner_stats`/`inner_limiter`
helpers along with their TODO.

Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 temporarily deployed to PR integration tests May 15, 2026 17:11 — with GitHub Actions Inactive
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 requested a deployment to PR integration tests May 15, 2026 17:21 — with GitHub Actions Waiting
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
@yerzhan7 yerzhan7 temporarily deployed to PR integration tests May 15, 2026 17:23 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants