Skip to content

perf: pickle summarize networks call #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

tristan-f-r
Copy link
Collaborator

Addresses 1/5 in #249.

@tristan-f-r tristan-f-r requested a review from ntalluri June 3, 2025 22:01
@tristan-f-r tristan-f-r added the analysis Analysis of PRA outputs label Jun 3, 2025
@tristan-f-r tristan-f-r changed the title chore: pickle summarize networks call perf: pickle summarize networks call Jun 3, 2025
@tristan-f-r tristan-f-r marked this pull request as draft June 4, 2025 15:48
@tristan-f-r tristan-f-r marked this pull request as ready for review June 11, 2025 16:41
network_ml_summary_algo = SEP.join([out_dir, '{dataset}-ml', 'ml-summary-{algorithm}.pickle'])
run:
summary_df = ml.summarize_networks(input.pathways)
with open(output.network_ml_summary_algo, 'wb') as pickle_writer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of putting this code in the SnakeMake file, could we move it to the ML code and treat it as a write pickle file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is code in the evaluation.py and dataset.py files to follow as reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. [I'll remember from now on that the Snakefile isn't the best place for logic 👍]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think most of the time, people do put the logic in SnakeMake, which doesn't make this solution "wrong". SPRAS just organizes differently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't think I like moving ml.summarize_networks to be the one to handle pickling the dataset, then, as the file-logic that Snakemake is requiring is a Snakemake restriction by definition. If we move the logic over, we would be moving the concerns of the caller to the callee, which is in violation of Postel's law.

Copy link
Collaborator

@ntalluri ntalluri Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn’t be in summarize_networks. It should be two functions similar to what is in dataset.py and evaluation.py.

Copy link
Collaborator Author

@tristan-f-r tristan-f-r Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spras/spras/dataset.py

Lines 30 to 35 in 4bcd711

def to_file(self, file_name):
"""
Saves dataset object to pickle file
"""
with open(file_name, "wb") as f:
pkl.dump(self, f)

I assume you mean these - if so, that sounds good 👍

output:
network_ml_summary_algo = SEP.join([out_dir, '{dataset}-ml', 'ml-summary.pickle'])
run:
summary_df = ml.summarize_networks(input.pathways)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this one

@@ -312,30 +343,25 @@ rule ml_analysis:
hac_image_horizontal = SEP.join([out_dir, '{dataset}-ml', 'hac-horizontal.png']),
hac_clusters_horizontal = SEP.join([out_dir, '{dataset}-ml', 'hac-clusters-horizontal.txt']),
run:
summary_df = ml.summarize_networks(input.pathways)
summary_df = pandas.read_pickle(input.network_ml_summary)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the function to write the pickle file, can we make a load pickle file function in the ml code. Call this for the network specific ones as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Analysis of PRA outputs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants