Skip to content

Evaluate model #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

MikeLippincott
Copy link
Member

This PR evaluates and predicts the annexin V IBP from every timepoint - warning: cool plots will be found beyond this point.

@MikeLippincott MikeLippincott requested a review from Copilot May 1, 2025 19:00
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new prediction script to evaluate and predict the annexin V IBP at every timepoint, generating outputs to be used for further analysis (including visualization).

  • Added a prediction script that loads both a regular and a shuffled model.
  • The script aggregates and processes feature data, applies model predictions, and saves the combined predictions to disk.
Files not reviewed (1)
  • 5.bulk_timelapse_model/scripts/4.plot_results.r: Language not supported

df = pd.read_parquet(profile_data_path)
metadata_cols = [cols for cols in df.columns if "Metadata" in cols]
features_cols = [cols for cols in df.columns if "Metadata" not in cols]
features_cols = features_cols
Copy link
Preview

Copilot AI May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The assignment to features_cols is redundant. Consider removing it to simplify the code.

Suggested change
features_cols = features_cols

Copilot uses AI. Check for mistakes.


metadata_columns = [x for x in aggregate_df.columns if "Metadata_" in x]
shuffled_profile_df = aggregate_df.copy()
for col in shuffled_profile_df.columns:
Copy link
Preview

Copilot AI May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider shuffling only the feature columns instead of all columns to avoid potential inconsistencies in the metadata alignment.

Suggested change
for col in shuffled_profile_df.columns:
feature_columns = [col for col in shuffled_profile_df.columns if col not in metadata_columns]
for col in feature_columns:

Copilot uses AI. Check for mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something really pleasing about the colors and the lines in this plot. Consider making the darker colors more distinguishable somehow - I had trouble seeing the difference between these four in particular (it could just be me!):
image



model_file_dir = pathlib.Path(
"../models/multi_regression_model_ntrees_1000.joblib"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about seeing these joblib files made me wonder if it could be helpful to switch to onnx format sometime and explore the possibilities. You could use sklearn-onnx to implement the models and perhaps onnxruntime to run the models. There could be performance and other benefits to making this shift. Hat tip to @MattsonCam, who mentioned ONNX to me a bit ago.

# In[5]:


# if the data_split is train and the time is not 12 then set to non_trained_pair
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number 12 here could be important to document (i.e., as an outsider I'm not certain why time 12 is significant).

Comment on lines +111 to +112
for col in metadata_columns:
predictions_df.insert(0, col, metadata_df[col])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider concating here if it makes sense to avoid the loop.

Suggested change
for col in metadata_columns:
predictions_df.insert(0, col, metadata_df[col])
predictions_df = pd.concat([metadata_df[metadata_columns], predictions_df], axis=1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants