Conversation
oliver-sanders
left a comment
There was a problem hiding this comment.
Cheers, will have a go at running with the original example.
Did you have to apply some diff to the profiler plugin to reveal the issue?
Sorry, yes I had to change up the profiler plugin: sutherlander@cortex-hyper:flow$ git diff main_loop/log_data_store.py
diff --git a/cylc/flow/main_loop/log_data_store.py b/cylc/flow/main_loop/log_data_store.py
index ec9fc8624..3d731d2fc 100644
--- a/cylc/flow/main_loop/log_data_store.py
+++ b/cylc/flow/main_loop/log_data_store.py
@@ -38,8 +38,21 @@ try:
except ModuleNotFoundError:
PLT = False
-from pympler.asizeof import asized
-
+from pympler.asizeof import asized, asizeof, Asizer
+
+# 'publish_deltas',
+# 'n_window_nodes',
+# 'n_window_edges',
+# 'n_window_node_walks',
+# 'n_window_completed_walks',
+# 'n_window_depths',
+# 'all_n_window_nodes',
+STORE_OTHER = {
+ 'family_pruned_ids',
+ 'prune_trigger_nodes',
+ 'prune_flagged_nodes',
+ 'pruned_task_proxies',
+}
@startup
async def init(scheduler, state):
@@ -51,19 +64,54 @@ async def init(scheduler, state):
state['objects'][key] = []
state['size'][key] = []
+ for attr_name in STORE_OTHER:
+ state['objects'][attr_name] = []
+ state['size'][attr_name] = []
+
+ #state['objects']['schd.config'] = []
+ #state['size']['schd.config'] = []
+
+ state['objects']['schd'] = []
+ state['size']['schd'] = []
+
+ state['objects']['dsmgr'] = []
+ state['size']['dsmgr'] = []
+
@periodic
async def log_data_store(scheduler, state):
"""Count the number of objects and the data store size."""
state['times'].append(time())
- for key, value in _iter_data_store(scheduler.data_store_mgr.data):
+ ds = scheduler.data_store_mgr
+ for key, value in _iter_data_store(ds.data):
state['objects'][key].append(
len(value)
)
state['size'][key].append(
- asized(value).size
+ asizeof(value)
)
+ for attr_name in STORE_OTHER:
+ attr_value = getattr(ds, attr_name)
+ state['objects'][attr_name].append(
+ len(attr_value)
+ )
+ state['size'][attr_name].append(
+ asizeof(attr_value)
+ )
+
+ #state['objects']['schd.config'].append(1)
+ #state['size']['schd.config'].append(asizeof(scheduler.config))
+ asizer = Asizer()
+ asizer.exclude_refs(scheduler.data_store_mgr)
+ state['objects']['schd'].append(1)
+ state['size']['schd'].append(asizer.asizeof(scheduler))
+
+ asizer = Asizer()
+ asizer.exclude_refs(scheduler)
+ state['objects']['dsmgr'].append(1)
+ state['size']['dsmgr'].append(asizer.asizeof(scheduler.data_store_mgr))
+
@shutdown
async def report(scheduler, state):
@@ -75,7 +123,9 @@ async def report(scheduler, state):
def _iter_data_store(data_store):
for item in data_store.values():
for key, value in item.items():
- if key != 'workflow':
+ if key == 'workflow':
+ yield (key, [value])
+ else:
yield (key, value)
# there should only be one workflow in the data store
breakBut think it's probably too specific to make permanent.. |
|
I've tested this against my example workflow and against a less cut down version of the same workflow and can confirm this fixes the leak - thanks. |
oliver-sanders
left a comment
There was a problem hiding this comment.
Profiling results look much better!
before:
after:
There's still quite an upward slope to prune_trigger_nodes and n_window_node_walks, but I'm not sure if that's a smaller leak. or just a natural increase due to the character of the workflow's graph.
Thanks for working on the plugin, it's a big help to get these diffs back into the project so we can reproduce the results. I've opened a PR to work on this a bit further (above graphs generated with this diff): dwsutherland#27
The code seems reasonable, but I'm having a bit of difficulty working out the purpose of the different attributes to understand when they should be housekept:
* Track all data_store_mgr attributes, not just "data". * Filter by configurable min size for plotting. * Increase plot size. * Prevent legends overlapping. * Remove dead space on RHS of plots.
4a2d2e0 to
7bdf8c6
Compare
7bdf8c6 to
cb627cc
Compare
Also, I believe, the reason for |
a393dcc to
a720542
Compare
| # Clear any boundary prune triggers not in the window. | ||
| # This can happen where the graph has paths not taken, i.e.: | ||
| # ``` | ||
| # foo => a | ||
| # foo:failed => b | ||
| # ``` | ||
| # So if `foo` then `a`, which when active/removed is the prune trigger | ||
| # for `foo`.. However, `b` is not used so delete the trigger here. | ||
| for trigger_id in set( | ||
| self.prune_trigger_nodes).difference(self.all_n_window_nodes): | ||
| del self.prune_trigger_nodes[trigger_id] |
There was a problem hiding this comment.
Here's the explanation for why the below fix was needed.
Co-authored-by: Tim Pillinger <26465611+wxtim@users.noreply.github.com>



closes #7199
before

after

Check List
CONTRIBUTING.mdand added my name as a Code Contributor.setup.cfg(andconda-environment.ymlif present).?.?.xbranch.