-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swift run: Speed up fd closes on macOS/Linux for high rlimits #8327
base: main
Are you sure you want to change the base?
Conversation
8ae49bd
to
fceea43
Compare
Would you be able to provide the source code for such program? I'd prefer it to be directly embedded in PR description or somewhere in the PR diff (even in code comments) for reproducibility and posterity. |
@MaxDesiatov Sure, it's just the default that |
Thanks for first passes! Will amend later today |
@dcantah Why not switch all of this over to use |
@jakepetroules I can trial! I wasn't sure if there was any reason for the more manual approach that was taken here today so didn't want to depart too far. It seems Process (if I'm reading the right code) does a lot of the same tricks. I guess the main departure for that would be |
Asked around and nobody could think of a specific reason. |
@jakepetroules Sweet! I'll move this over to |
There are |
@MaxDesiatov Good point. I'm somewhat concerned on changing it to be honest the more I look at this. The other thing is if we went with |
Good point, thanks for that clarification, @MaxDesiatov. We might want to add a paraphrased version of your reply as a code comment so it's clear from reading the sources why it may be using the lower-level APIs here. |
fceea43
to
2a9d6ad
Compare
Updated. Makes me wonder though, if someone had recently introduced that pthread_suspend_all_np to avoid a race with libdispatch, and we'd like to stay with just raw execve to keep things chugging along, should we just |
@swift-ci please test |
Sorry I missed this PR while working on getting |
@dschaefer2 Makes sense. I'll swap to just /dev/fd or /proc/self/fd + cloexec for all platforms then |
be53c02
to
f9b7ff5
Compare
@dschaefer2 I've amended this change to just read /proc/self/fd and /dev/fd on macOS/Linux and cloexec everything. I've left the BSDs as is though as I truly don't know what the right thing to do on them is, but it seems some folks here recently changed bits there already to make things better. It seems FreeBSD is probably fine with closefrom(2) as we suspend every thread, but OpenBSD's in a weird spot. I'm open to suggestions. |
Today, for everything that isn't the BSDs, we grab the maximum number of open fds the process supports and then loop through from 3 -> max, calling close(2) on everything. Even for a low open fd count of 65k this will typically result in 99+% of these close's being EBADF. At a 65k nofile RLIMIT the sluggishness is not really felt, but on systems that may have this in the millions it is extremely stark. `swift run` on a hello world program can take minutes before the program is actually ran. There's a couple ways to work around this, but there's also another issue in that actually closing the fds poses a problem in some cases with debug builds of libdispatch. There can be a race between libdispatch going to use the kqueue fd(s) and us closing them before the execve. Because of this, the most sane thing to do is instead of closing we can set all of the open fds as CLOEXEC. To do this efficiently on linux and macOS we can read /dev/fd and /proc/self/fd respectively and only close what's actually open. Below is the delta between two runs of `swift run` on a simple hello world program. The shell I'm running these in has a nofile rlimit of 1 billion. At 100 million it falls to about 20 seconds on my machine, and gets progressively smaller until the two approaches aren't really any different at all. With the patch: ``` Build of product 'fdwoo' complete! (0.23s) Hello, world! real 0m0.925s user 0m0.698s sys 0m0.129s ``` Without: ``` Build of product 'closerange' complete! (0.15s) Hello, world! real 2m43.203s user 0m47.357s sys 1m55.344s ``` Signed-off-by: Danny Canter <[email protected]>
f9b7ff5
to
e92e7c2
Compare
@swift-ci please test |
Today, for everything that isn't the BSDs, we grab the
maximum number of open fds the process supports and then
loop through from 3 -> max, calling close(2) on everything.
Even for a low open fd count of 65k this will typically result
in 99+% of these close's being EBADF. At a 65k nofile RLIMIT
the sluggishness is not really felt, but on systems that may
have this in the millions it is extremely stark.
swift run
on a hello world program can take minutes before the program
is actually ran.
There's a couple ways to work around this, but there's also another issue
in that close(2)'ing the fds poses a problem itself in some cases with debug
builds of libdispatch. There can be a race between libdispatch going to
use the kqueue fd(s) and us closing them before the execve. Because of this,
the most sane thing to do on some platforms is instead of closing we can set
all of the open fds as CLOEXEC. To do this efficiently on linux and macOS we
can read /dev/fd and /proc/self/fd respectively and only close what's actually open.
Below is the delta between two runs of
swift run
on a simple hello worldprogram. The shell I'm running these in has a nofile rlimit of 1 billion.
At 100 million it falls to about 20 seconds on my machine, and gets progressively
smaller until the two approaches aren't really any different at all.
With the patch:
Without: