fix(test): make test_schedstat_fd_closed_on_thread_exit not flaky#398
Merged
Conversation
The previous assertion checked fcntl(fd, F_GETFD) != -1 after the spawned thread exited. Linux readily recycles fd numbers, and other tests in the same suite open many files concurrently (including their own per-thread schedstat fds via tokio worker poll callbacks). When a sibling test opened a file in the small window after our spawned thread closed its schedstat fd, the fd number got reused and the check incorrectly reported a leak. Capture the path the fd points to via /proc/self/fd/<fd> inside the spawned thread, and after join verify that the fd is either closed or now points at a different file. Each thread opens schedstat for its own tid, so /proc/self/task/<spawned_tid>/schedstat is unique to the spawned thread's open and no concurrent test can collide on it. Verified by running the full cargo test -p dial9-tokio-telemetry --lib suite 13 times in a row with zero failures (previously failed roughly 3/10 runs), and confirmed the new test still catches a real leak by temporarily mem::forget'ing the OwnedFd in the production path.
7c0b3e5 to
c1de8ce
Compare
Fluzko
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
test_schedstat_fd_closed_on_thread_exit(added in 5e905e8) was flaky under the parallel test suite — failing roughly 3/10 runs ofcargo test -p dial9-tokio-telemetry --lib, but passing every time when run in isolation.Root cause. The test spawned a thread, called
SchedStat::read_current()(which opens/proc/self/task/<tid>/schedstatand stores theOwnedFdin a thread-local), captured the raw fd number, joined, and then assertedfcntl(fd, F_GETFD) != -1. The kernel readily recycles fd numbers, and other tests in the same binary open many files concurrently — including their own per-thread schedstat fds opened from tokio worker poll callbacks (runtime_context.rs). If a sibling test opened any file in the small window after our spawned thread'sOwnedFdwas dropped, the fd number got reused andfcntl(F_GETFD)returned success on a totally unrelated fd, falsely reporting a leak.Fix. Inside the spawned thread, also record what the fd points at via
readlink("/proc/self/fd/<fd>"). After join, check whether the fd is either closed (readlink fails) or now points at a different file (slot was reused for an unrelated open — which itself proves ourOwnedFdwas dropped). The captured path/proc/self/task/<spawned_tid>/schedstatis unique to the spawned thread's open: each thread opens schedstat for its own tid, so no concurrent test can possibly hold an open fd for that exact path. If the fd is still open AND still points at it, that's a real leak — the assertion still fires.The schedstat fd-cleanup invariant is unchanged; only the way the test observes it is hardened.
Test plan
main: 3/10 full-suite runs failed.*cell.borrow_mut() = Some(owned)withstd::mem::forget(owned)in the production path: test fails withschedstat fd 3 leaked after thread exit (still points at "/proc/.../task/.../schedstat"). Reverted before commit.cargo clippy -p dial9-tokio-telemetry --lib --testsclean.