Skip to content

WIP / Experiment: reference counting directory FDs to keep them alive longer #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

bertschinger
Copy link
Contributor

@bertschinger bertschinger commented Mar 20, 2025

This is a work-in-progress, experimental branch that tests refcounting dir FDs with the goal of keeping them alive while there are still children waiting to be processed.

The idea is to see if this helps the cache behavior on Lustre with the Lustre inode cache disabled.

This also enables using openat() in the processdir() functions in order to reduce pathname resolution overhead.

@bertschinger
Copy link
Contributor Author

bertschinger commented Mar 20, 2025

This also needs a fallback added to handle the case where the number of available FDs is exhausted.
It should probably query the number of available FDs when the program begins, and then make sure that the number of pinned dir FDs is always below (total FDs - number of threads), so that every thread is guaranteed the ability to open the file/directory that it is currently processing.

Done, it now reserves 3 * nthreads file descriptors and won't allow pinned dir FDs to use up that reservation.

Copy link

codecov bot commented Mar 20, 2025

Codecov Report

Attention: Patch coverage is 75.82418% with 22 lines in your changes missing coverage. Please review.

Project coverage is 90.96%. Comparing base (2348ed8) to head (cd2700f).

Files with missing lines Patch % Lines
src/bf.c 72.09% 8 Missing and 4 partials ⚠️
src/QueuePerThreadPool.c 81.25% 2 Missing and 1 partial ⚠️
src/gufi_dir2index.c 66.66% 2 Missing and 1 partial ⚠️
src/gufi_dir2trace.c 75.00% 0 Missing and 1 partial ⚠️
src/gufi_index2dir.c 75.00% 0 Missing and 1 partial ⚠️
src/gufi_treesummary.c 75.00% 0 Missing and 1 partial ⚠️
src/parallel_cpr.c 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #169      +/-   ##
==========================================
- Coverage   91.08%   90.96%   -0.13%     
==========================================
  Files          57       57              
  Lines        8394     8454      +60     
  Branches     1114     1124      +10     
==========================================
+ Hits         7646     7690      +44     
- Misses        467      477      +10     
- Partials      281      287       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bertschinger bertschinger force-pushed the dir_refcounting branch 3 times, most recently from 3378a3d to 8111a08 Compare March 20, 2025 21:19
Comment on lines +101 to +103
// fprintf(stderr, "Warning: system may not allow enough open files for the number of requested threads.\n");
// fprintf(stderr, "Max number of open files: %llu; Number of threads requested: %llu\n", rl.rlim_cur, nthreads);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
@bertschinger bertschinger force-pushed the dir_refcounting branch 3 times, most recently from b85370b to 1985be7 Compare March 21, 2025 22:33
struct rlimit rl;
int res = getrlimit(RLIMIT_NOFILE, &rl);
if (res) {
fprintf(stderr, "Warning: could not get open file limit: %s\n", strerror(errno));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store errno first

@calccrypto calccrypto force-pushed the main branch 2 times, most recently from fd865c3 to 5ab578d Compare April 8, 2025 21:00
this is needed when the queues hold items that need additional work to
free above just calling free().

so that they can clean up everything on exit properly.
Since keeping directory FDs alive in order to open child directories
with relative paths incurs the risk of exhausting available file
descriptors, this tracks the maximum number of FDs that should be
allocated to long-living directory handles.

When that maximum is reached, directory handles will no longer have
their lifetime extended, until the limit goes back down again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants