Rewrite aggregation processing to be more efficient#223
Closed
Rewrite aggregation processing to be more efficient#223
Conversation
- Slightly refactor pre-order - Add level order, which should be faster when fetching from the database, by making one query per-level of the tree, instead of possibly one per-node
The previous implementation was incredibly sub-optimal. It can be easily replaced with a join on the parent_id.
Previously aggregation worked by traversing the modules tree in pre-order. But to ensure that children are aggregated before their parents, we can relax that order a bit to just processes all the results on the same level before all of those on a level above it (the topmost level consisting of the root). This allows fetching much more results at once and significantly reduce t he number of trips to the database - from a number proportional to the number of nodes, to exactly and no more than the maximum depth of the tree. It also makes it much easier to accumulate the created tree metric results to be created all at once. That also saves a huge number of trips to the database. Using the aggregation performance tests, in my development machine the average time - combined with the indexing changes that were previously made - went from around 200s to <20s. Regarding tests: a complete refactor was necessary, and made possible by the module results tree factory. The tests ended up much cleaner and arguably better, as they can verify the actual values being aggregated while mocking only the necessary data accesses.
It is no longer used since the new aggregation processing collects the tree metric results itself, making the auxiliary logic to find out whether a node already has a result for a metric unnecessary.
Member
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously aggregation worked by traversing the modules tree in
pre-order. But to ensure that children are aggregated before their
parents, we can relax that order a bit to just processes all the results
on the same level before all of those on a level above it (the topmost
level consisting of the root).
This allows fetching much more results at once and significantly reduce
the number of trips to the database - from a number proportional to the
number of nodes, to exactly and no more than the maximum depth of the
tree.
It also makes it much easier to accumulate the created tree metric
results to be created all at once. That also saves a huge number of
trips to the database.
Regarding tests: a complete refactor was necessary, and made possible by
the module results tree factory. The tests ended up much cleaner and
arguably better, as they can verify the actual values being aggregated
while mocking only the necessary data accesses.