-
Notifications
You must be signed in to change notification settings - Fork 750
lib: substantially reduce syscalls when attempting to create dirs #7695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0f82ded
to
bb26e3a
Compare
Thanks! Could you move/copy the PR description to the commit message? See https://jj-vcs.github.io/jj/latest/contributing/ |
bb26e3a
to
60e30f9
Compare
Done. Sorry about that. |
I wasn't planning on it, but I addressed the last two optimization opportunities. The changes in this PR right now make my original scenario go from 10 minutes to 4.5, and with the changes applied in this commit I've shaved off another ~30s: landaire-contrib/jj@checkout-perf-improvements...landaire-contrib:jj:push-uvunuzyowqrw Please let me know if you would like me to merge it into this commit or in a separate PR. I think I need to sleep on it and clean up the code a little bit, I'm unsure if I'm happy with how that commit is right now. New debug trace: https://share.firefox.dev/4hfSltF. You'll notice in particular that I'll also note that this is still slower than a |
46413d8
to
6a526f5
Compare
lib/src/local_working_copy.rs
Outdated
// In practice, none of these should panic. | ||
|
||
// Strip working_copy_path from path_prefix to get the common repo prefix | ||
let common_repo_prefix = path_prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please advise on these .expect()
calls. Happy to convert to a proper error if you can recommend an error type, but I don't think these will ever panic in practice.
b1ab59f
to
c62e9df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR includes two separate changes and touches security code which we should review carefully, I suggest splitting it into "reuse open file" and "reuse previously-created directory" patches. Thanks.
lib/src/local_working_copy.rs
Outdated
err: err.into(), | ||
}), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can instead add same_file_handle_from_file()
/same_file_handle_from_path()
helpers? Since "from_file" shouldn't fail with NotFound
error, we can make it a bit stricter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue this discussion on #7742
c62e9df
to
a37b821
Compare
See jj-vcs#7695 for motivation/background One of the hottest paths in `jj` during checkout is `can_create_file` and `create_parent_dirs`. Both of these functions check to ensure that there aren't attempts to write to special directories (`reject_reserved_existing`) which is a fairly expensive check. This change reduces file open/close syscalls by re-busing file handles when possible and caching paths where we know a reserved directory does not exist.
See jj-vcs#7695 for motivation/background One of the hottest paths in `jj` during checkout is `can_create_file` and `create_parent_dirs`. Both of these functions check to ensure that there aren't attempts to write to special directories (`reject_reserved_existing`) which is a fairly expensive check. This change reduces file open/close syscalls by reusing file handles when possible and caching paths where we know a reserved directory does not exist.
After filing jj-vcs#7685 I ran some perf traces to try to understand just what was taking so long during these slow operations. The changes in this PR reduces clone time for my large repo from about 10 minutes to 4m30s. You can see my thought process in the comments of the above task but to summarize: During checkout we check files/directories being created to ensure that we are not attempting to write to a reserved directory (`.jj/`, `.git/`). `same_file::is_same_file()` is an expensive check that invokes _at least 4_ syscalls when called in a naive manner (`open()` and `close()` for each path -- plus possibly more for getting file info? I haven't counted). There are a few optimization gaps here that are causing significant slowdowns. The following checklist reflects what I've optimized in this PR, and what still remains: - [x] `create_parent_dirs` will be called for each file/directory and for each parent dir in a path **try to create it and check if the dir is an illegal name via `reject_reserved_existing_path()`**. There is no caching of directories which have already been created. - [ ] `reject_reserved_existing` calls `same_file::is_same_file()` in a loop for all reserved names, but the path which _has maybe been created_ isn't going to change, so its handle could be cached. - [ ] `can_create_new_file` attempts to create the file then just uses the result as an indicator of whether or not the file is created. However, since we _have a `File`_ that `File` can be directly converted to a `same_file::Handle` and avoid a syscall that currently occurs when converting the `Path` to a `same_file::Handle`. - [ ] `can_create_new_file` deletes the file immediately after. There's probably an opportunity here to **not** delete the file and re-use it for file write operations. - [ ] Say we have 1000 files in `foo/`. For each file that's written, `reject_reserved_existing` is going to make at least `RESERVED_DIR_NAMES.len() * 1000` syscalls constructing `foo/{reserved_dir_name}` paths, testing their existence, etc. Maybe `jj` might create this dir? But I don't think that should ever happen -- so why not cache the handle **if** it's created and use a lookup table in `reject_reserved_existing` to only conduct these types of checks if the handle is resolved? Or alternatively cache that the file _does not_ exist after the first check. Here are some perf traces of running a `jj git clone` of my large repo before: Release: https://share.firefox.dev/4oiSTBw Debug: https://share.firefox.dev/4qmJBX1 And after: Release: https://share.firefox.dev/4nK66mH Debug: https://share.firefox.dev/470W1ed
a37b821
to
7b4b7b5
Compare
checkout See jj-vcs#7695 for motivation/background One of the hottest paths in `jj` during checkout is `can_create_file` and `create_parent_dirs`. Both of these functions check to ensure that there aren't attempts to write to special directories (`reject_reserved_existing`) which is a fairly expensive check. This change reduces file open/close syscalls by reusing file handles when possible and caching paths where we know a reserved directory does not exist.
checkout See jj-vcs#7695 for motivation/background One of the hottest paths in `jj` during checkout is `can_create_file` and `create_parent_dirs`. Both of these functions check to ensure that there aren't attempts to write to special directories (`reject_reserved_existing`) which is a fairly expensive check. This change reduces file open/close syscalls by reusing file handles when possible and caching paths where we know a reserved directory does not exist.
checkout See jj-vcs#7695 for motivation/background One of the hottest paths in `jj` during checkout is `can_create_file` and `create_parent_dirs`. Both of these functions check to ensure that there aren't attempts to write to special directories (`reject_reserved_existing`) which is a fairly expensive check. This change reduces file open/close syscalls by reusing file handles when possible and caching paths where we know a reserved directory does not exist.
checkout See jj-vcs#7695 for motivation/background One of the hottest paths in `jj` during checkout is `can_create_file` and `create_parent_dirs`. Both of these functions check to ensure that there aren't attempts to write to special directories (`reject_reserved_existing`) which is a fairly expensive check. This change reduces file open/close syscalls by reusing file handles when possible and caching paths where we know a reserved directory does not exist.
checkout See jj-vcs#7695 for motivation/background One of the hottest paths in `jj` during checkout is `can_create_file` and `create_parent_dirs`. Both of these functions check to ensure that there aren't attempts to write to special directories (`reject_reserved_existing`) which is a fairly expensive check. This change reduces file open/close syscalls by reusing file handles when possible.
checkout See jj-vcs#7695 for motivation/background One of the hottest paths in `jj` during checkout is `can_create_file` and `create_parent_dirs`. Both of these functions check to ensure that there aren't attempts to write to special directories (`reject_reserved_existing`) which is a fairly expensive check. This change reduces file open/close syscalls by reusing file handles when possible.
After filing #7685 I ran some perf traces to try to understand just what was taking so long during these slow operations. The changes in this PR reduces clone time for my large repo from about 10 minutes to 4m30s.
You can see my thought process in the comments of the above task but to summarize:
During checkout we check files/directories being created to ensure that we are not attempting to write to a reserved directory (
.jj/
,.git/
).same_file::is_same_file()
is an expensive check that invokes at least 4 syscalls when called in a naive manner (open()
andclose()
for each path -- plus possibly more for getting file info? I haven't counted).There are a few optimization gaps here that are causing significant slowdowns. The following checklist reflects what I've optimized in this PR, and what still remains:
create_parent_dirs
will be called for each file/directory and for each parent dir in a path try to create it and check if the dir is an illegal name viareject_reserved_existing_path()
. There is no caching of directories which have already been created.reject_reserved_existing
callssame_file::is_same_file()
in a loop for all reserved names, but the path which has maybe been created isn't going to change, so its handle could be cached.can_create_new_file
attempts to create the file then just uses the result as an indicator of whether or not the file is created. However, since we have aFile
thatFile
can be directly converted to asame_file::Handle
and avoid a syscall that currently occurs when converting thePath
to asame_file::Handle
.can_create_new_file
deletes the file immediately after. There's probably an opportunity here to not delete the file and re-use it for file write operations.foo/
. For each file that's written,reject_reserved_existing
is going to make at leastRESERVED_DIR_NAMES.len() * 1000
syscalls constructingfoo/{reserved_dir_name}
paths, testing their existence, etc. Maybejj
might create this dir? But I don't think that should ever happen -- so why not cache the handle if it's created and use a lookup table inreject_reserved_existing
to only conduct these types of checks if the handle is resolved? Or alternatively cache that the file does not exist after the first check.Here are some perf traces of running a
jj git clone
of my large repo before:Release: https://share.firefox.dev/4oiSTBw
Debug: https://share.firefox.dev/4qmJBX1
And after:
Release: https://share.firefox.dev/4nK66mH
Debug: https://share.firefox.dev/470W1ed
Checklist
If applicable:
CHANGELOG.md
README.md
,docs/
,demos/
)cli/src/config-schema.json
)