Add concurrent write-handle limit returning ENOMEM on open()#1831
Add concurrent write-handle limit returning ENOMEM on open()#1831yerzhan7 wants to merge 10 commits into
Conversation
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
Userspace close() returns before the kernel has dispatched the FUSE RELEASE op to Mountpoint, so the WriteHandleSlot is briefly held after the file is dropped. Poll the retry open() with a short timeout instead of expecting it to succeed on the first try, which races on slow CI hosts. Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
| fh: u64, | ||
| write_mode: &WriteMode, | ||
| flags: OpenFlags, | ||
| write_handle_limiter: Option<&Arc<WriteHandleLimiter>>, |
There was a problem hiding this comment.
nit: could we avoid the reference to Arc? e.g. by the limiter holding an Arc<AtomicU64> internally?
|
|
||
| #[cfg(feature = "s3_tests")] | ||
| #[test_case(200)] | ||
| #[test_case(48)] |
There was a problem hiding this comment.
Consider explicitly setting the memory limit in this test, so we can make clear that we want to test the maximum number of writers.
There was a problem hiding this comment.
Done, thanks. Also, added back the 200+ test case to ensure no regression.
| mod superblock; | ||
| mod sync; | ||
| pub mod upload; | ||
| pub mod write_handle_limiter; |
There was a problem hiding this comment.
Consider moving under memory or fs.
| .bucket(bucket.to_string()) | ||
| .enable_backpressure(true) | ||
| .initial_read_window_size(1024 * 1024) | ||
| .part_size(32 * 1024 * 1024) |
There was a problem hiding this comment.
Could we define this and other values as constants at the top of this function? And move the calculation described in the rustdoc there.
| /// for read handles. Released automatically when the `FileHandle` is dropped — held purely | ||
| /// for that `Drop` side effect, so the field is never read directly. | ||
| #[expect(dead_code, reason = "held for its Drop side effect")] | ||
| pub(super) write_slot: Option<WriteHandleSlot>, |
There was a problem hiding this comment.
Shouldn't this be in FileHandleState::Write?
There was a problem hiding this comment.
Also, prefer _write_slot to the dead code expect.
There was a problem hiding this comment.
Good suggestion - done.
| .metablock | ||
| .open_handle(ino, fh, &write_mode, flags, Some(&self.write_handle_limiter)) | ||
| .await?; | ||
| let write_slot = new_handle.write_slot.take(); |
There was a problem hiding this comment.
Why take() and mut new_handle? We don't actually want to mutate anything.
If this was just to "fix" lifetimes, consider instead unpacking or copying the data you will need later.
|
|
||
| ### Maximum number of files open for write | ||
|
|
||
| Mountpoint enforces a cap on the number of files that may be open for write at the same time, to prevent out-of-memory crashes. The cap is computed at startup from the configured memory target and write part size: |
There was a problem hiding this comment.
Can we say "to control memory usage" instead of oom crashes?
There was a problem hiding this comment.
We should start mentioning what's the default for memory target (and thus max writes).
There was a problem hiding this comment.
Done. Added mentioning of the default value.
|
|
||
| | Metric | Type | Dimensions | Description | | ||
| |--------|------|------------|-------------| | ||
| | `fs.write_handle_limit_exceeded` | Counter | | Number of `open()` calls for write rejected because the [concurrent-writers cap](CONFIGURATION.md#maximum-number-of-files-open-for-write) was reached | |
There was a problem hiding this comment.
Before documenting the new metric, we should add it to metrics::defs. Or we can leave it for a later review of all new metrics.
There was a problem hiding this comment.
Didn't know about the defs file.
Removed from docs for now - we can review all new metrics later as you've suggested.
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
Pulls in awslabs#1832 (CursorState consolidation). The conflict in mountpoint-s3-fs/src/memory/pool.rs was in the delegation block: upstream removed reserve/try_reserve/release_cursor/next_cursor_id/inner_stats/ inner_limiter in favor of a unified `create_cursor`, while this branch had added `mem_limit`/`data_buffer_budget` accessors used by `WriteHandleLimiter`. Resolved by keeping both — upstream's `create_cursor` plus our two accessors — and dropped the now-unused test-only `inner_stats`/`inner_limiter` helpers along with their TODO. Signed-off-by: Yerzhan Mazhkenov <20302932+yerzhan7@users.noreply.github.com>
Introduce
WriteHandleLimiter, an admission-control type that caps the numberof files open for write at the same time. The cap is derived from the
configured memory target and write part size:
max_concurrent_writes = (memory_target - additional_mem_reserved) / write_part_sizeWhen the cap is reached,
open()for write returns ENOMEM before any inodestate is mutated, so retries are clean. The reserved slot is held by the
returned
WriteHandleSlot; itsDropreleases the slot when the file handle isclosed.
The limiter is enforced inside
Metablock::open_handleso the slot acquirehappens before
start_writingmutates the inode.Note on ordering: the write-handle limiter check runs before the inode-conflict
check inside
open_handle. This is intentional — the limiter check is a lock-freeatomic and produces no state to roll back, so it's the natural fail-fast point.
The trade-off is that when both errors apply (cap exhausted AND same file
already open for write), the user sees
ENOMEMrather than the more specificEPERM. We accept this; it's a rare double-failure case and the user retrieseither way.
Does this change impact existing behavior?
Yes.
Does this change need a changelog entry? Does it require a version change?
Yes. Intentionally skipped changelog entry and version bumps for now to avoid merge conflicts and because we are working on a feature branch.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).