awslabs · yerzhan7 · May 12, 2026 · May 14, 2026 · May 14, 2026 · May 14, 2026
diff --git a/doc/CONFIGURATION.md b/doc/CONFIGURATION.md
@@ -295,6 +295,18 @@ At mount time, Mountpoint automatically selects appropriate defaults to provide
 * By default, Mountpoint can serve up to 16 concurrent file or directory operations, and automatically scales up to reach this limit. If your application makes more than this many concurrent reads and writes (including to the same or different files), you can improve performance by increasing this limit with the `--max-threads` command-line argument. Higher values of this flag might cause Mountpoint to use more of your instance's resources.
 * When reading or writing files to S3, Mountpoint divides them into parts and uses parallel requests to improve throughput. You can change the part size Mountpoint uses for these parallel requests using the `--read-part-size` and `--write-part-size` command-line arguments, providing a maximum number of bytes per part for reading or writing respectively. For Mountpoint v1.7.2 or earlier, use `--part-size` instead. The default value for these arguments is 8 MiB (8,306,688 bytes), which in our testing is the largest value that achieves maximum throughput. Larger values can reduce the number of billed requests Mountpoint makes, but also reduce the throughput of object reads and writes to S3.
 
+### Maximum number of files open for write
+
+Mountpoint enforces a cap on the number of files that may be open for write at the same time, to prevent out-of-memory crashes. The cap is computed at startup from the configured memory target and write part size:
+
+```
+max_concurrent_writes = (memory_target − additional_mem_reserved) / write_part_size
+```
+
+`memory_target` is set with `--memory-target` and `write_part_size` is set with `--write-part-size` (or with `--part-size`). `additional_mem_reserved` is `max(128 MiB, memory_target / 8)` and is held back from data buffers for Mountpoint's own overhead. The minimum supported `memory_target` is 512 MiB, which allows 48 concurrent writers at the default 8 MiB write part size.
+
+Once the cap is reached, `open()` calls for write return `ENOMEM` ("Cannot allocate memory") until an existing write handle is closed. To raise the cap, increase `--memory-target` or decrease `--write-part-size`.
+
 ### Maximum object size
 
 In its default configuration, there is no maximum on the size of objects Mountpoint can read. However, Mountpoint uses [multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) when writing new objects, and multipart upload allows a maximum of 10,000 parts for an object. This means Mountpoint can only upload objects up to 80,000 MiB (78.1 GiB) in size. If your application tries to write objects larger than this limit, writes will fail with an out of space error.

diff --git a/doc/METRICS.md b/doc/METRICS.md
@@ -86,6 +86,7 @@ Mountpoint emits the following metrics:
 
 | Metric | Type | Dimensions | Description |
 |--------|------|------------|-------------|
+| `fs.write_handle_limit_exceeded` | Counter | | Number of `open()` calls for write rejected because the [concurrent-writers cap](CONFIGURATION.md#maximum-number-of-files-open-for-write) was reached |
 | `fuse.io_size` | Histogram | `fuse_request` (read, write) | Bytes transferred per FUSE request |
 | `fuse.request_errors` | Counter | `fuse_request` (read, write, etc.) | Number of FUSE request errors |
 | `fuse.request_latency` | Histogram | `fuse_request` (read, write, etc.) | Time to process a FUSE request |

diff --git a/doc/SEMANTICS.md b/doc/SEMANTICS.md
@@ -89,6 +89,8 @@ Your application should not write to the same object from multiple instances at
 
 By default, Mountpoint ensures that new file uploads to a single key are atomic. As soon as an upload completes, other clients are able to see the new key and the entire content of the object. If the `--incremental-upload` flag is set, however, Mountpoint may issue multiple separate uploads during file writes to append data to the object. After each upload, the appended object in your S3 bucket will be visible to other clients.
 
+Mountpoint enforces a cap on the number of files that may be open for write at the same time, derived from `--memory-target` and `--write-part-size`. When the cap is reached, `open()` for write returns `ENOMEM` until an existing write handle is closed. See [CONFIGURATION.md](CONFIGURATION.md#maximum-number-of-files-open-for-write) for more details.
+
 ### Optional metadata and object content caching
 
 Mountpoint also offers optional metadata and object content caching.

diff --git a/mountpoint-s3-fs/src/fs.rs b/mountpoint-s3-fs/src/fs.rs
@@ -22,6 +22,7 @@ use crate::prefetch::{Prefetcher, PrefetcherBuilder};
 use crate::sync::atomic::{AtomicU64, Ordering};
 use crate::sync::{Arc, AsyncMutex, AsyncRwLock};
 use crate::upload::{Uploader, UploaderConfig};
+use crate::write_handle_limiter::WriteHandleLimiter;
 
 mod config;
 pub use config::{CacheConfig, S3FilesystemConfig};
@@ -55,6 +56,7 @@ where
     metablock: Arc<dyn Metablock>,
     prefetcher: Prefetcher<Client>,
     uploader: Uploader<Client>,
+    write_handle_limiter: Arc<WriteHandleLimiter>,
     next_handle: AtomicU64,
     file_handles: AsyncRwLock<HashMap<u64, Arc<FileHandle<Client>>>>,
 }
@@ -150,6 +152,11 @@ where
         trace!(?config, "new filesystem");
 
         let pool = pool.clone();
+        let write_handle_limiter = Arc::new(WriteHandleLimiter::new(
+            pool.mem_limit(),
+            pool.data_buffer_budget(),
+            client.write_part_size(),
+        ));
         let prefetcher = prefetch_builder.build(runtime.clone(), pool.clone(), config.prefetcher_config);
         let uploader = Uploader::new(
             client.clone(),
@@ -167,6 +174,7 @@ where
             metablock: Arc::new(metablock),
             prefetcher,
             uploader,
+            write_handle_limiter,
             next_handle: AtomicU64::new(1),
             file_handles: AsyncRwLock::new(HashMap::new()),
         }
@@ -349,13 +357,18 @@ where
 
         let fh = self.next_handle(); // TODO: can we delay obtaining the next handle until we know we are creating a new file handle?
         let write_mode = self.config.write_mode();
-        let new_handle = self.metablock.open_handle(ino, fh, &write_mode, flags).await?;
+        let mut new_handle = self
+            .metablock
+            .open_handle(ino, fh, &write_mode, flags, Some(&self.write_handle_limiter))
+            .await?;
+        let write_slot = new_handle.write_slot.take();
         let state = FileHandleState::new(&new_handle, flags, self).await?;
         let handle = FileHandle {
             ino,
             location: new_handle.lookup.try_into_s3_location()?,
             open_pid: pid,
             state: AsyncMutex::new(state),
+            write_slot,
         };
         debug!(fh, ino, "new {:?} file handle created", new_handle.mode);
         self.file_handles.write().await.insert(fh, Arc::new(handle));
@@ -803,6 +816,7 @@ mod tests {
                 .bucket(bucket.to_string())
                 .enable_backpressure(true)
                 .initial_read_window_size(1024 * 1024)
+                .part_size(1024 * 1024)
                 .build(),
         );
         // Create "dir1" in the client to avoid creating it locally
@@ -1090,4 +1104,134 @@ mod tests {
         );
         S3Filesystem::new(client, prefetcher_builder, pool, runtime, superblock, fs_config)
     }
+
+    /// Verifies that the limiter rejects opens for write past the configured cap with `ENOMEM`,
+    /// and that releasing a handle re-opens a slot. Uses a deliberately tight `mem_limit` so the
+    /// derived cap is small enough to exhaust quickly.
+    ///
+    /// The MockClient `part_size` is also the value `client.write_part_size()` returns. With
+    /// `mem_limit = 256 MiB`, `part_size = 32 MiB`, `additional_mem_reserved = max(128, 32) = 128 MiB`,
+    /// the formula gives `(256 - 128) / 32 = 4` concurrent writers.
+    #[tokio::test]
+    async fn test_open_for_write_returns_enomem_when_cap_exhausted() {
+        let test_name = "test_open_for_write_returns_enomem_when_cap_exhausted";
+        let bucket = Bucket::new("bucket").unwrap();
+        let client = MockClient::config()
+            .bucket(bucket.to_string())
+            .enable_backpressure(true)
+            .initial_read_window_size(1024 * 1024)
+            .part_size(32 * 1024 * 1024)
+            .build();
+        client.add_object(
+            &format!("dir1/{}1.txt", test_name),
+            MockObject::constant(0xa1, 15, ETag::for_tests()),
+        );
+
+        let runtime = Runtime::new(ThreadPool::builder().pool_size(2).create().unwrap());
+        let pool = PagedPool::new_with_candidate_sizes([32 * 1024 * 1024], 256 * 1024 * 1024);
+        let prefetcher_builder = Prefetcher::default_builder(client.clone());
+        let fs_config = S3FilesystemConfig {
+            allow_overwrite: true,
+            ..Default::default()
+        };
+        let superblock = Superblock::new(
+            client.clone(),
+            S3Path::new(bucket, Default::default()),
+            SuperblockConfig {
+                cache_config: fs_config.cache_config.clone(),
+                s3_personality: fs_config.s3_personality,
+            },
+        );
+        let fs = S3Filesystem::new(client, prefetcher_builder, pool, runtime, superblock, fs_config);
+
+        // Sanity-check that we computed exactly 4 writer slots given the test's tuning.
+        let cap = fs.write_handle_limiter.max_concurrent_writes();
+        assert_eq!(cap, 4);
+
+        // Resolve the directory inode for mknod calls below.
+        let dir_entry = fs.lookup(FUSE_ROOT_INODE, "dir1".as_ref()).await.unwrap();
+        let read_dir_ino = dir_entry.attr.ino;
+
+        // Create more files than the write cap, then prove the cap holds.
+        let mut files = Vec::new();
+        for i in 0..(cap + 1) {
+            let dentry = fs
+                .mknod(
+                    read_dir_ino,
+                    format!("file{i}.bin").as_ref(),
+                    libc::S_IFREG | libc::S_IRWXU,
+                    0,
+                    0,
+                )
+                .await
+                .unwrap();
+            files.push(dentry);
+        }
+
+        // Open up to the cap: all should succeed.
+        let mut open_handles = Vec::new();
+        for dentry in files.iter().take(cap) {
+            let opened = fs
+                .open(dentry.attr.ino, OpenFlags::O_WRONLY, 0)
+                .await
+                .expect("open within cap should succeed");
+            open_handles.push(opened);
+        }
+
+        // The next open exceeds the cap → ENOMEM with the expected message.
+        let err = fs
+            .open(files[cap].attr.ino, OpenFlags::O_WRONLY, 0)
+            .await
+            .expect_err("opening past the cap should return ENOMEM");
+        assert_eq!(err.errno, libc::ENOMEM);
+        let msg = format!("{err}");
+        assert!(
+            msg.contains("cannot open file for write"),
+            "unexpected error message: {msg}"
+        );
+        assert!(
+            msg.contains(&cap.to_string()),
+            "error message should reference cap of {cap}: {msg}"
+        );
+
+        // Re-opening the rejected file *before* freeing a slot still returns ENOMEM (no inode
+        // state was mutated by the rejected open, and the cap is still full).
+        let err = fs
+            .open(files[cap].attr.ino, OpenFlags::O_WRONLY, 0)
+            .await
+            .expect_err("re-opening the rejected file while cap is full should still return ENOMEM");
+        assert_eq!(err.errno, libc::ENOMEM);
+
+        // Locks in the fail-fast check order: when the cap is exhausted AND the target file
+        // already has an active writer (open_handles[0] is still live for files[0]), the user
+        // sees ENOMEM rather than EPERM. The cheap lock-free limiter check runs before the
+        // inode-locked conflict check, so cap exhaustion wins. See the commit message for the
+        // ordering rationale; flipping this order is a deliberate design change.
+        let err = fs
+            .open(files[0].attr.ino, OpenFlags::O_WRONLY, 0)
+            .await
+            .expect_err("opening an already-writing file at cap should return an error");
+        assert_eq!(
+            err.errno,
+            libc::ENOMEM,
+            "limiter check should fire before inode-conflict check (got errno {})",
+            err.errno
+        );
+
+        // Closing one of the open handles releases a slot.
+        fs.flush(files[0].attr.ino, open_handles[0].fh, 0, 0)
+            .await
+            .expect("flush should succeed");
+        fs.release(files[0].attr.ino, open_handles[0].fh, 0, None, true)
+            .await
+            .expect("release should succeed");
+
+        // The rejected file can now be opened cleanly. This validates that the ENOMEM rejection
+        // didn't leave the inode in `LocalOpenForWriting` — the metablock acquires the slot
+        // before mutating any state, so a rejection is fully reversible.
+        let _opened_retry = fs
+            .open(files[cap].attr.ino, OpenFlags::O_WRONLY, 0)
+            .await
+            .expect("retrying the previously-rejected file should succeed after a slot is freed");
+    }
 }
diff --git a/mountpoint-s3-fs/src/fs/error.rs b/mountpoint-s3-fs/src/fs/error.rs
@@ -189,6 +189,7 @@ impl ToErrno for InodeError {
             InodeError::OutOfOrderReadDir { .. } => libc::EBADF,
             InodeError::NoSuchDirHandle { .. } => libc::EINVAL,
             InodeError::FlexibleRetrievalObjectNotAccessible(_) => libc::EACCES,
+            InodeError::WriteHandleLimitExceeded(_) => libc::ENOMEM,
         }
     }
 }

diff --git a/mountpoint-s3-fs/src/fs/handles.rs b/mountpoint-s3-fs/src/fs/handles.rs
@@ -10,6 +10,7 @@ use crate::object::ObjectId;
 use crate::prefetch::PrefetchGetObject;
 use crate::sync::{Arc, AsyncMutex};
 use crate::upload::{AppendUploadRequest, UploadRequest};
+use crate::write_handle_limiter::WriteHandleSlot;
 
 use super::{Error, InodeNo, OpenFlags, S3Filesystem, ToErrno};
 
@@ -23,6 +24,11 @@ where
     pub state: AsyncMutex<FileHandleState<Client>>,
     /// Process that created the handle
     pub open_pid: u32,
+    /// Slot reserved on the [`MemoryLimiter`] for this handle. `Some` for write handles, `None`
+    /// for read handles. Released automatically when the `FileHandle` is dropped — held purely
+    /// for that `Drop` side effect, so the field is never read directly.
+    #[expect(dead_code, reason = "held for its Drop side effect")]
+    pub(super) write_slot: Option<WriteHandleSlot>,
 }
 
 impl<Client> FileHandle<Client>

diff --git a/mountpoint-s3-fs/src/lib.rs b/mountpoint-s3-fs/src/lib.rs
@@ -21,6 +21,7 @@ pub mod s3;
 mod superblock;
 mod sync;
 pub mod upload;
+pub mod write_handle_limiter;
 
 pub use async_util::Runtime;
 pub use config::MountpointConfig;

diff --git a/mountpoint-s3-fs/src/manifest/metablock.rs b/mountpoint-s3-fs/src/manifest/metablock.rs
@@ -14,6 +14,7 @@ use crate::metablock::{
 use crate::s3::S3Path;
 use crate::sync::atomic::{AtomicU64, Ordering};
 use crate::sync::{Arc, Mutex, RwLock};
+use crate::write_handle_limiter::WriteHandleLimiter;
 
 use super::core::{Manifest, ManifestDirIter, ManifestError};
 
@@ -187,6 +188,7 @@ impl Metablock for ManifestMetablock {
         _fh: u64,
         _write_mode: &WriteMode,
         flags: OpenFlags,
+        _write_handle_limiter: Option<&Arc<WriteHandleLimiter>>,
     ) -> Result<NewHandle, InodeError> {
         let lookup = self.getattr(ino, false).await?;
         if flags.contains(OpenFlags::O_WRONLY) {

diff --git a/mountpoint-s3-fs/src/memory/limiter.rs b/mountpoint-s3-fs/src/memory/limiter.rs
@@ -123,6 +123,19 @@ impl MemoryLimiter {
         }
     }
 
+    /// The configured memory limit in bytes. Note this is the total memory target including
+    /// non-buffer overhead, not the budget available for data buffers — see [`Self::data_buffer_budget`].
+    pub fn mem_limit(&self) -> u64 {
+        self.mem_limit
+    }
+
+    /// The static memory budget available for data buffers, i.e. `mem_limit - additional_mem_reserved`.
+    /// This is the upper bound on buffer-backed allocations and is used by
+    /// [`crate::write_handle_limiter::WriteHandleLimiter`] to derive its cap.
+    pub fn data_buffer_budget(&self) -> u64 {
+        self.mem_limit.saturating_sub(self.additional_mem_reserved)
+    }
+
     /// Reserve the memory for future uses. Always succeeds, even if it means going beyond
     /// the configured memory limit.
     pub fn reserve(&self, cursor_id: CursorId, area: BufferArea, size: u64) {

diff --git a/mountpoint-s3-fs/src/memory/pool.rs b/mountpoint-s3-fs/src/memory/pool.rs
@@ -199,6 +199,16 @@ impl PagedPool {
 
     // ─── Delegation methods for MemoryLimiter ───────────────────────────────────
 
+    /// The configured memory limit in bytes.
+    pub fn mem_limit(&self) -> u64 {
+        self.inner.limiter.mem_limit()
+    }
+
+    /// The static memory budget available for data buffers, i.e. `mem_limit - additional_mem_reserved`.
+    pub fn data_buffer_budget(&self) -> u64 {
+        self.inner.limiter.data_buffer_budget()
+    }
+
     /// Reserve memory for future uses. Always succeeds (unconditional).
     pub fn reserve(&self, cursor_id: CursorId, area: BufferArea, size: u64) {
         self.inner.limiter.reserve(cursor_id, area, size);

diff --git a/mountpoint-s3-fs/src/metablock.rs b/mountpoint-s3-fs/src/metablock.rs
@@ -22,6 +22,8 @@ pub use pending_upload::PendingUploadHook;
 pub use stat::{InodeKind, InodeNo, InodeStat};
 
 use crate::fs::OpenFlags;
+use crate::sync::Arc;
+use crate::write_handle_limiter::WriteHandleLimiter;
 
 pub const ROOT_INODE_NO: InodeNo = crate::fs::FUSE_ROOT_INODE;
 
@@ -63,6 +65,7 @@ pub trait Metablock: Send + Sync {
         fh: u64,
         write_mode: &WriteMode,
         flags: OpenFlags,
+        write_handle_limiter: Option<&Arc<WriteHandleLimiter>>,
     ) -> Result<NewHandle, InodeError>;
 
     /// Increase the size of a file open for writing.
@@ -226,20 +229,27 @@ pub enum ReadWriteMode {
 pub struct NewHandle {
     pub lookup: Lookup,
     pub mode: ReadWriteMode,
+    /// Write-handle slot reserved for this handle when the open resolves to write mode.
+    /// `Some` if the metablock layer reserved a slot during `open_handle`, `None` otherwise
+    /// (read mode, or no limiter configured). The caller must transfer ownership into its own
+    /// `FileHandle` so the slot is released when the file handle is dropped.
+    pub write_slot: Option<crate::write_handle_limiter::WriteHandleSlot>,
 }
 
 impl NewHandle {
     pub fn read(lookup: Lookup) -> Self {
         Self {
             lookup,
             mode: ReadWriteMode::Read,
+            write_slot: None,
         }
     }
 
     pub fn write(lookup: Lookup) -> Self {
         Self {
             lookup,
             mode: ReadWriteMode::Write,
+            write_slot: None,
         }
     }
 }
diff --git a/mountpoint-s3-fs/src/metablock/error.rs b/mountpoint-s3-fs/src/metablock/error.rs
@@ -9,6 +9,7 @@ use crate::manifest::ManifestError;
 use crate::metablock::S3Location;
 use crate::sync::Arc;
 use crate::upload::UploadError;
+use crate::write_handle_limiter::WriteHandleLimitError;
 
 use super::InodeNo;
 
@@ -82,6 +83,16 @@ pub enum InodeError {
     NoSuchDirHandle { fh: u64 },
     #[error("objects in flexible retrieval storage classes are not accessible")]
     FlexibleRetrievalObjectNotAccessible(InodeErrorInfo),
+    #[error(
+        "cannot open file for write: exceeded max allowed concurrent write file handlers of {max} \
+         based on memory target {mem_limit_mib}MiB (part size is {write_part_size_mib}MiB). \
+         Increase --memory-target or decrease --write-part-size to allow for more concurrent writes, \
+         or close existing open for write file handlers and retry open() operation.",
+        max = .0.max,
+        mem_limit_mib = .0.mem_limit_mib,
+        write_part_size_mib = .0.write_part_size_mib,
+    )]
+    WriteHandleLimitExceeded(WriteHandleLimitError),
 }
 
 impl InodeError {