-
Notifications
You must be signed in to change notification settings - Fork 96
fix data-store memory problem #7206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
aca3cfc
fix data-store memory problem
dwsutherland 01061cd
changelog entry
dwsutherland 3adaa03
annotate some data-store manager attributes
dwsutherland 72065ad
preserve detection method plugin mods
dwsutherland bab860f
test fix
dwsutherland c10d354
main loop: log data store ++
oliver-sanders c4d1909
plotting adjustments
dwsutherland b5779ab
test log data-store fix
dwsutherland bf27f06
further elaborate on window management attrs
dwsutherland cb627cc
add pruning justification to module docs
dwsutherland a720542
comment on unused trigger removal
dwsutherland 22a737c
Tim's dislike fix
dwsutherland File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Fixes a memory leak in the data management that impacted some workflows. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,8 +42,11 @@ | |
| generated at the boundary of an active node's graph walk and registering active | ||
| node's parents against them. Once active, these boundary nodes act as the prune | ||
| triggers for the associated parent nodes. Set operations are used to do a diff | ||
| between the nodes of active paths (paths whose node is in n=0) | ||
| and the nodes of flagged paths (whose boundary node(s) have become active). | ||
| between the nodes of active paths (paths whose node is in n=0) and the nodes of | ||
| flagged paths (whose boundary node(s) have become active). | ||
| This method is used to avoid "blinking", where a task becomes non-active then | ||
| is removed (along with it's window/walk) before a descendant is added, causing | ||
| it to disapear then reappear in the store (and, hence, UIs). | ||
|
|
||
| Updates are created by the event/task/job managers. | ||
|
|
||
|
|
@@ -658,20 +661,35 @@ def __init__(self, schd, n_edge_distance=1): | |
| # internal delta | ||
| self.delta_queues = {self.workflow_id: {}} | ||
| self.publish_deltas = [] | ||
|
|
||
| # internal n-window | ||
| self.all_task_pool = set() | ||
| self.all_n_window_nodes = set() | ||
| self.n_window_nodes = {} | ||
| self.n_window_edges = set() | ||
| # The walk information for window nodes, which is used for | ||
| # pre-populating new walks (if possible) and node depth calculations. | ||
| self.n_window_node_walks = {} | ||
| self.n_window_completed_walks = set() | ||
| self.n_window_depths = {} | ||
| self.update_window_depths = False | ||
| self.db_load_task_proxies: Dict[str, Tuple[TaskProxy, bool]] = {} | ||
| # Family node IDs that have been pruned. Sent with deltas and used for | ||
| # exclusion from state total and other calculations. | ||
| self.family_pruned_ids = set() | ||
| # Boundary nodes are those nodes at the boundary of the window, and | ||
| # these are used to trigger pruning for associated nodes. | ||
| # self.prune_trigger_nodes collects walk node IDs associated with a | ||
| # boundary node so that nodes of isolates, adjacent paths, and it's | ||
| # own path can be flagged for pruning. | ||
| self.prune_trigger_nodes = {} | ||
| # Node ids flagged for pruning. | ||
| # Those not in active paths (the walk paths of active tasks) will be | ||
| # pruned. | ||
| self.prune_flagged_nodes = set() | ||
| # Set of removed task proxies to avoid applying/sending new deltas. | ||
| self.pruned_task_proxies = set() | ||
|
|
||
| self.updates_pending = False | ||
| self.updates_pending_follow_on = False | ||
| self.publish_pending = False | ||
|
|
@@ -1214,15 +1232,10 @@ def increment_graph_window( | |
| boundary_nodes = {active_id} | ||
| # associate | ||
| for tp_id in boundary_nodes: | ||
| try: | ||
| self.prune_trigger_nodes.setdefault(tp_id, set()).update( | ||
| active_walk['walk_ids'] | ||
| ) | ||
| self.prune_trigger_nodes[tp_id].discard(tp_id) | ||
| except KeyError: | ||
| self.prune_trigger_nodes.setdefault(tp_id, set()).add( | ||
| active_id | ||
| ) | ||
| self.prune_trigger_nodes.setdefault(tp_id, set()).update( | ||
| active_walk['walk_ids'] | ||
| ) | ||
| self.prune_trigger_nodes[tp_id].discard(tp_id) | ||
| # flag manual triggers for pruning on deletion. | ||
| if is_manual_submit: | ||
| self.prune_trigger_nodes.setdefault(active_id, set()).add( | ||
|
|
@@ -1279,20 +1292,23 @@ def remove_pool_node(self, name, point): | |
| self.updates_pending = True | ||
| # flagged isolates/end-of-branch nodes for pruning on removal | ||
| if ( | ||
| tp_id in self.prune_trigger_nodes and | ||
| tp_id in self.prune_trigger_nodes[tp_id] | ||
| tp_id in self.prune_trigger_nodes and | ||
| tp_id in self.prune_trigger_nodes[tp_id] | ||
| ): | ||
| self.prune_flagged_nodes.update(self.prune_trigger_nodes[tp_id]) | ||
| del self.prune_trigger_nodes[tp_id] | ||
| # If, at the time of removal, no desendents are active then only | ||
| # flag the node not the entire walk. | ||
| elif ( | ||
| tp_id in self.n_window_nodes and | ||
| self.n_window_nodes[tp_id].isdisjoint(self.all_task_pool) | ||
| tp_id in self.n_window_nodes and | ||
| self.n_window_nodes[tp_id].isdisjoint(self.all_task_pool) | ||
| ): | ||
| self.prune_flagged_nodes.add(tp_id) | ||
| elif tp_id in self.n_window_node_walks: | ||
| self.prune_flagged_nodes.update( | ||
| self.n_window_node_walks[tp_id]['walk_ids'] | ||
| ) | ||
| if tp_id in self.prune_trigger_nodes: | ||
| del self.prune_trigger_nodes[tp_id] | ||
| self.update_window_depths = True | ||
| self.updates_pending = True | ||
|
|
||
|
|
@@ -1930,6 +1946,7 @@ def window_resize_rewalk(self) -> None: | |
|
|
||
| # Clear window walks, and walk from scratch. | ||
| self.prune_flagged_nodes.clear() | ||
| self.prune_trigger_nodes.clear() | ||
| self.n_window_node_walks.clear() | ||
dwsutherland marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| for tp_id in self.all_task_pool: | ||
| tokens = Tokens(tp_id) | ||
|
|
@@ -2021,6 +2038,17 @@ def prune_data_store(self): | |
| # Absolute triggers may be present in task pool, so recheck. | ||
| # Clear the rest. | ||
| self.prune_flagged_nodes.intersection_update(self.all_task_pool) | ||
| # Clear any boundary prune triggers not in the window. | ||
| # This can happen where the graph has paths not taken, i.e.: | ||
| # ``` | ||
| # foo => a | ||
| # foo:failed => b | ||
| # ``` | ||
| # So if `foo` then `a`, which when active/removed is the prune trigger | ||
| # for `foo`.. However, `b` is not used so delete the trigger here. | ||
| for trigger_id in set( | ||
| self.prune_trigger_nodes).difference(self.all_n_window_nodes): | ||
| del self.prune_trigger_nodes[trigger_id] | ||
|
Comment on lines
+2041
to
+2051
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's the explanation for why the below fix was needed.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense! |
||
|
|
||
| tp_data = self.data[self.workflow_id][TASK_PROXIES] | ||
| tp_added = self.added[TASK_PROXIES] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.