Make go to relevant constant in time using an index#3526
Closed
domingo2000 wants to merge 1 commit into
Closed
Conversation
How to use the Graphite Merge QueueAdd the label graphite-merge to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. |
Contributor
|
This pull request is being marked as stale because there was no activity in the last 2 months |
Contributor
|
This pull request is being marked as stale because there was no activity in the last 2 months |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Closes #3327
Go to relevant file is currently performing very slow in large repositories as pointed in the issue.
Profiling with stackprof it was found that the main problem is in the call to Dir.glob that matches all the files that have the desired patterns. This is disk I/O intensive and have a time complexity O(N).
This PR provides a proof of concept of how that time could be reduced.
Implementation
Create an index for the basename paths of the files. The index has the following structure
"foo")["foo_test.rb", "test_foo.rb", "foo_spec.rb"])This index is going to find candidates for the given query in nearly constant time (Assuming K matches is low for the search result).
Integrate this new index in the
RubyIndexer::Index.Space Complexity: O(N) the number of files in the repository
Time Complexity on boot: O(N)
Time Complexity on query: O(1)
Assuming the 400K files mentioned in the issue this should consume approximately
Note: Alternatively this index could be lazy computed only using this resources if the user uses the feature for the first time.
Automated Tests
Not for now, want to receive feedback first.
Manual Tests
Tested in the core Buk repository with more than 25K ruby files. Benchmark is provided in this draft.
Benchmarked in Buk main repository having ~25K ruby files. user system total real old-glob 1.019865 0.292442 1.312307 ( 1.312549) new-index 0.000049 0.000007 0.000056 ( 0.000056)Notes