Skip to content

[Feat] Stricter fd limit checks#4308

Open
ljedrz wants to merge 5 commits into
stagingfrom
feat/stricter_fd_limit_checks
Open

[Feat] Stricter fd limit checks#4308
ljedrz wants to merge 5 commits into
stagingfrom
feat/stricter_fd_limit_checks

Conversation

@ljedrz

@ljedrz ljedrz commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Startup enforcement is trivial, just a switch from an existing log to a panic.

As for the fs check, this PR proposes a more advanced solution than the one outlined in the issue, because attempting to open a file:

  • would only flip at 100% capacity (no room for recovery)
  • consumes an fd to run the check, potentially being the one that breaks the node
  • is racy: the probe might succeed, while a microsecond later, the node might fail

I tested it briefly locally, checking both the per-process (by adding connections) and per-system (by starting external sockets) value.

The nix dependency (needed for non-Linux UNIXes) already exists in our tree; rlimit is new, but it's small, and we're already using its only dependency (libc). Using libc directly would break the deny(unsafe) rule.

Closes #4306.

ljedrz added 3 commits June 9, 2026 13:26
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens file-descriptor (fd) limit handling in the snarkOS CLI by enforcing a minimum per-process fd limit at startup and adding a periodic runtime monitor to surface high fd usage before the node hard-fails.

Changes:

  • Switch startup open-files limit check from warning to panic! when under the configured minimum.
  • Add a Tokio-spawned fd monitor that periodically logs elevated/critical fd usage at both process and system scope (Linux and non-Linux Unix paths).
  • Introduce new target-specific dependencies to support fd probing across Unix variants.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cli/src/helpers/mod.rs Exposes the new fd check module and makes the startup fd-limit check fatal.
cli/src/helpers/fd_check.rs Implements per-process and system-wide fd usage probing plus a periodic monitor task.
cli/src/commands/start.rs Starts the periodic fd monitor on Unix during node startup.
cli/Cargo.toml Adds new target-specific dependencies (nix + rlimit) for the fd probe implementation.
Cargo.lock Records newly introduced crates/versions for the added dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cli/src/helpers/mod.rs
Comment thread cli/src/helpers/mod.rs
Comment thread cli/Cargo.toml Outdated
Comment thread cli/Cargo.toml Outdated
Comment thread cli/src/helpers/fd_check.rs
ljedrz added 2 commits June 9, 2026 16:51
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: ljedrz <ljedrz@users.noreply.github.com>

@vicsn vicsn left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling on macOS:

error[E0277]: the trait bound `i32: AsFd` is not satisfied
   --> cli/src/helpers/fd_check.rs:75:41
    |
75  |     let n = (0..max).filter(|&fd| fcntl(fd, FcntlArg::F_GETFD).is_ok()).count();
    |                                   ----- ^^ the trait `AsFd` is not implemented for `i32`
    |                                   |
    |                                   required by a bound introduced by this call
    |
    = help: the following other types implement trait `AsFd`:
              &T
              &mut T
              Arc<T>
              AsyncFd<T>
              BorrowedFd<'_>
              Box<T>
              ChildStderr
              ChildStdin
            and 47 others
note: required by a bound in `nix::fcntl::fcntl`
   --> /Users/victorsintnicolaas/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/nix-0.30.1/src/fcntl.rs:862:18
    |
862 | pub fn fcntl<Fd: std::os::fd::AsFd>(fd: Fd, arg: FcntlArg) -> Result<c_int> {
    |                  ^^^^^^^^^^^^^^^^^ required by this bound in `fcntl`

Err(e) => error!(error = %e, "process fd probe failed"),
}

// (2) whole-machine fds

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// (2) whole-machine fds
// (2) whole-machine fds are allowed 5 percentage points more leeway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Improve reporting of open file limit issues

3 participants