Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swift run: Speed up fd closes on macOS/Linux for high rlimits #8327

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dcantah
Copy link

@dcantah dcantah commented Mar 5, 2025

Today, for everything that isn't the BSDs, we grab the
maximum number of open fds the process supports and then
loop through from 3 -> max, calling close(2) on everything.
Even for a low open fd count of 65k this will typically result
in 99+% of these close's being EBADF. At a 65k nofile RLIMIT
the sluggishness is not really felt, but on systems that may
have this in the millions it is extremely stark. swift run
on a hello world program can take minutes before the program
is actually ran.

There's a couple ways to work around this, but there's also another issue
in that close(2)'ing the fds poses a problem itself in some cases with debug
builds of libdispatch. There can be a race between libdispatch going to
use the kqueue fd(s) and us closing them before the execve. Because of this,
the most sane thing to do on some platforms is instead of closing we can set
all of the open fds as CLOEXEC. To do this efficiently on linux and macOS we
can read /dev/fd and /proc/self/fd respectively and only close what's actually open.

Below is the delta between two runs of swift run on a simple hello world
program. The shell I'm running these in has a nofile rlimit of 1 billion.
At 100 million it falls to about 20 seconds on my machine, and gets progressively
smaller until the two approaches aren't really any different at all.

With the patch:

Build of product 'fdwoo' complete! (0.23s)
Hello, world!

real    0m0.925s
user    0m0.698s
sys     0m0.129s

Without:

Build of product 'closerange' complete! (0.15s)
Hello, world!

real	2m43.203s
user	0m47.357s
sys	1m55.344s

@MaxDesiatov
Copy link
Contributor

Below is the delta between two runs of swift run on a simple hello world program

Would you be able to provide the source code for such program? I'd prefer it to be directly embedded in PR description or somewhere in the PR diff (even in code comments) for reproducibility and posterity.

@dcantah
Copy link
Author

dcantah commented Mar 5, 2025

Would you be able to provide the source code for such program?

@MaxDesiatov Sure, it's just the default that swift package init --type executable produces today. I'll link in the description.

@dcantah
Copy link
Author

dcantah commented Mar 5, 2025

Thanks for first passes! Will amend later today

@jakepetroules
Copy link
Contributor

@dcantah Why not switch all of this over to use Process (which wraps posix_spawn)?

@dcantah
Copy link
Author

dcantah commented Mar 5, 2025

@jakepetroules I can trial! I wasn't sure if there was any reason for the more manual approach that was taken here today so didn't want to depart too far. It seems Process (if I'm reading the right code) does a lot of the same tricks. I guess the main departure for that would be swift run would not be replaced with the users binary, but rather be a child of it if we used Foundation.Process. Does anyone know if that would cause issues or we relied on the current behavior today?

@jakepetroules
Copy link
Contributor

I guess the main departure for that would be swift run would not be replaced with the users binary, but rather be a child of it if we used Foundation.Process. Does anyone know if that would cause issues or we relied on the current behavior today?

Asked around and nobody could think of a specific reason.

@dcantah
Copy link
Author

dcantah commented Mar 5, 2025

@jakepetroules Sweet! I'll move this over to Process and see if that's a good replacement

@MaxDesiatov
Copy link
Contributor

There are swift run --debugger and swift run --repl modes, those could in theory break if debugger attachment logic relies on how it is currently set up.

@dcantah
Copy link
Author

dcantah commented Mar 5, 2025

@MaxDesiatov Good point. I'm somewhat concerned on changing it to be honest the more I look at this. The other thing is if we went with Process we'd need to forward signals from swift run to the child now which adds some complexity.

@jakepetroules
Copy link
Contributor

There are swift run --debugger and swift run --repl modes, those could in theory break if debugger attachment logic relies on how it is currently set up.

Good point, thanks for that clarification, @MaxDesiatov.

We might want to add a paraphrased version of your reply as a code comment so it's clear from reading the sources why it may be using the lower-level APIs here.

@dcantah dcantah force-pushed the swift-run-speedup-exec branch from fceea43 to 2a9d6ad Compare March 6, 2025 02:53
@dcantah
Copy link
Author

dcantah commented Mar 6, 2025

Updated. Makes me wonder though, if someone had recently introduced that pthread_suspend_all_np to avoid a race with libdispatch, and we'd like to stay with just raw execve to keep things chugging along, should we just fcntl(...CLOEXEC) everything instead of closing? Presumably we'd avoid the race conditions detailed with closing the kqueue fds from dispatch as we'd just let the system do it during exec for us.

@dcantah
Copy link
Author

dcantah commented Mar 6, 2025

@swift-ci please test

@dschaefer2
Copy link
Member

Updated. Makes me wonder though, if someone had recently introduced that pthread_suspend_all_np to avoid a race with libdispatch, and we'd like to stay with just raw execve to keep things chugging along, should we just fcntl(...CLOEXEC) everything instead of closing? Presumably we'd avoid the race conditions detailed with closing the kqueue fds from dispatch as we'd just let the system do it during exec for us.

Sorry I missed this PR while working on getting swift run to work with Swift Build. In my PR, #8331, I actually did switch it to CLOEXEC. Closing the fd's was causing havoc with the Swift Build shutdown and crashing SwiftPM.

@dcantah
Copy link
Author

dcantah commented Mar 7, 2025

@dschaefer2 Makes sense. I'll swap to just /dev/fd or /proc/self/fd + cloexec for all platforms then

@dcantah dcantah force-pushed the swift-run-speedup-exec branch 2 times, most recently from be53c02 to f9b7ff5 Compare March 8, 2025 03:39
@dcantah
Copy link
Author

dcantah commented Mar 8, 2025

@dschaefer2 I've amended this change to just read /proc/self/fd and /dev/fd on macOS/Linux and cloexec everything. I've left the BSDs as is though as I truly don't know what the right thing to do on them is, but it seems some folks here recently changed bits there already to make things better. It seems FreeBSD is probably fine with closefrom(2) as we suspend every thread, but OpenBSD's in a weird spot. I'm open to suggestions.

Today, for everything that isn't the BSDs, we grab the
maximum number of open fds the process supports and then
loop through from 3 -> max, calling close(2) on everything.
Even for a low open fd count of 65k this will typically result
in 99+% of these close's being EBADF. At a 65k nofile RLIMIT
the sluggishness is not really felt, but on systems that may
have this in the millions it is extremely stark. `swift run`
on a hello world program can take minutes before the program
is actually ran.

There's a couple ways to work around this, but there's also another issue
in that actually closing the fds poses a problem in some cases with debug
builds of libdispatch. There can be a race between libdispatch going to
use the kqueue fd(s) and us closing them before the execve. Because of this,
the most sane thing to do is instead of closing we can set all of the open
fds as CLOEXEC. To do this efficiently on linux and macOS we can read /dev/fd
and /proc/self/fd respectively and only close what's actually open.

Below is the delta between two runs of `swift run` on a simple hello world
program. The shell I'm running these in has a nofile rlimit of 1 billion.
At 100 million it falls to about 20 seconds on my machine, and gets progressively
smaller until the two approaches aren't really any different at all.

With the patch:
```
Build of product 'fdwoo' complete! (0.23s)
Hello, world!

real    0m0.925s
user    0m0.698s
sys     0m0.129s
```

Without:
```
Build of product 'closerange' complete! (0.15s)
Hello, world!

real	2m43.203s
user	0m47.357s
sys	1m55.344s
```

Signed-off-by: Danny Canter <[email protected]>
@dcantah dcantah force-pushed the swift-run-speedup-exec branch from f9b7ff5 to e92e7c2 Compare March 8, 2025 03:48
@dcantah
Copy link
Author

dcantah commented Mar 11, 2025

@swift-ci please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants