-
Notifications
You must be signed in to change notification settings - Fork 24
perf: pickle summarize networks call #253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Addresses 1/5 in Reed-CompBio#249
network_ml_summary_algo = SEP.join([out_dir, '{dataset}-ml', 'ml-summary-{algorithm}.pickle']) | ||
run: | ||
summary_df = ml.summarize_networks(input.pathways) | ||
with open(output.network_ml_summary_algo, 'wb') as pickle_writer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of putting this code in the SnakeMake file, could we move it to the ML code and treat it as a write pickle file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is code in the evaluation.py and dataset.py files to follow as reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. [I'll remember from now on that the Snakefile isn't the best place for logic 👍]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think most of the time, people do put the logic in SnakeMake, which doesn't make this solution "wrong". SPRAS just organizes differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't think I like moving ml.summarize_networks
to be the one to handle pickling the dataset, then, as the file-logic that Snakemake is requiring is a Snakemake restriction by definition. If we move the logic over, we would be moving the concerns of the caller to the callee, which is in violation of Postel's law.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn’t be in summarize_networks. It should be two functions similar to what is in dataset.py and evaluation.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 30 to 35 in 4bcd711
def to_file(self, file_name): | |
""" | |
Saves dataset object to pickle file | |
""" | |
with open(file_name, "wb") as f: | |
pkl.dump(self, f) |
I assume you mean these - if so, that sounds good 👍
output: | ||
network_ml_summary_algo = SEP.join([out_dir, '{dataset}-ml', 'ml-summary.pickle']) | ||
run: | ||
summary_df = ml.summarize_networks(input.pathways) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with this one
@@ -312,30 +343,25 @@ rule ml_analysis: | |||
hac_image_horizontal = SEP.join([out_dir, '{dataset}-ml', 'hac-horizontal.png']), | |||
hac_clusters_horizontal = SEP.join([out_dir, '{dataset}-ml', 'hac-clusters-horizontal.txt']), | |||
run: | |||
summary_df = ml.summarize_networks(input.pathways) | |||
summary_df = pandas.read_pickle(input.network_ml_summary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the function to write the pickle file, can we make a load pickle file function in the ml code. Call this for the network specific ones as well.
Addresses 1/5 in #249.