Evaluate model #17

MikeLippincott · 2025-05-01T19:00:15Z

This PR evaluates and predicts the annexin V IBP from every timepoint - warning: cool plots will be found beyond this point.

review-notebook-app · 2025-05-01T19:00:21Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Copilot

Pull Request Overview

This PR adds a new prediction script to evaluate and predict the annexin V IBP at every timepoint, generating outputs to be used for further analysis (including visualization).

Added a prediction script that loads both a regular and a shuffled model.
The script aggregates and processes feature data, applies model predictions, and saves the combined predictions to disk.

Files not reviewed (1)

5.bulk_timelapse_model/scripts/4.plot_results.r: Language not supported

Copilot · 2025-05-01T19:00:52Z

5.bulk_timelapse_model/scripts/3.prediction.py

+df = pd.read_parquet(profile_data_path)
+metadata_cols = [cols for cols in df.columns if "Metadata" in cols]
+features_cols = [cols for cols in df.columns if "Metadata" not in cols]
+features_cols = features_cols


[nitpick] The assignment to features_cols is redundant. Consider removing it to simplify the code.

Suggested change

features_cols = features_cols

Copilot · 2025-05-01T19:00:52Z

5.bulk_timelapse_model/scripts/3.prediction.py

+
+metadata_columns = [x for x in aggregate_df.columns if "Metadata_" in x]
+shuffled_profile_df = aggregate_df.copy()
+for col in shuffled_profile_df.columns:


[nitpick] Consider shuffling only the feature columns instead of all columns to avoid potential inconsistencies in the metadata alignment.

Suggested change

for col in shuffled_profile_df.columns:

feature_columns = [col for col in shuffled_profile_df.columns if col not in metadata_columns]

for col in feature_columns:

d33bs · 2025-05-09T21:45:37Z

5.bulk_timelapse_model/figures/predicted_PC1.png

There's something really pleasing about the colors and the lines in this plot. Consider making the darker colors more distinguishable somehow - I had trouble seeing the difference between these four in particular (it could just be me!):

d33bs · 2025-05-09T21:53:49Z

5.bulk_timelapse_model/scripts/3.prediction.py

+
+
+model_file_dir = pathlib.Path(
+    "../models/multi_regression_model_ntrees_1000.joblib"


Something about seeing these joblib files made me wonder if it could be helpful to switch to onnx format sometime and explore the possibilities. You could use sklearn-onnx to implement the models and perhaps onnxruntime to run the models. There could be performance and other benefits to making this shift. Hat tip to @MattsonCam, who mentioned ONNX to me a bit ago.

d33bs · 2025-05-09T21:56:52Z

5.bulk_timelapse_model/scripts/3.prediction.py

+# In[5]:
+
+
+# if the data_split is train and the time is not 12 then set to non_trained_pair


The number 12 here could be important to document (i.e., as an outsider I'm not certain why time 12 is significant).

d33bs · 2025-05-09T22:07:18Z

5.bulk_timelapse_model/scripts/3.prediction.py

+for col in metadata_columns:
+    predictions_df.insert(0, col, metadata_df[col])


Consider concating here if it makes sense to avoid the loop.

Suggested change

for col in metadata_columns:

predictions_df.insert(0, col, metadata_df[col])

predictions_df = pd.concat([metadata_df[metadata_columns], predictions_df], axis=1)

MikeLippincott added 3 commits May 1, 2025 12:18

predictions and plotting

45b3b28

predictions and plotting

a664574

adding figures

3885e8d

MikeLippincott requested a review from Copilot May 1, 2025 19:00

Copilot AI reviewed May 1, 2025

View reviewed changes

d33bs approved these changes May 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluate model #17

Evaluate model #17

Uh oh!

MikeLippincott commented May 1, 2025

Uh oh!

review-notebook-app bot commented May 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 1, 2025

Uh oh!

Copilot AI May 1, 2025

Uh oh!

d33bs May 9, 2025

Uh oh!

d33bs May 9, 2025

Uh oh!

d33bs May 9, 2025

Uh oh!

d33bs May 9, 2025

Uh oh!

Uh oh!

	for col in shuffled_profile_df.columns:
	feature_columns = [col for col in shuffled_profile_df.columns if col not in metadata_columns]
	for col in feature_columns:



		model_file_dir = pathlib.Path(
		"../models/multi_regression_model_ntrees_1000.joblib"

		# In[5]:


		# if the data_split is train and the time is not 12 then set to non_trained_pair

		for col in metadata_columns:
		predictions_df.insert(0, col, metadata_df[col])

	for col in metadata_columns:
	predictions_df.insert(0, col, metadata_df[col])
	predictions_df = pd.concat([metadata_df[metadata_columns], predictions_df], axis=1)

Evaluate model #17

Are you sure you want to change the base?

Evaluate model #17

Uh oh!

Conversation

MikeLippincott commented May 1, 2025

Uh oh!

review-notebook-app bot commented May 1, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 1, 2025

Choose a reason for hiding this comment

Uh oh!

d33bs May 9, 2025

Choose a reason for hiding this comment

Uh oh!

d33bs May 9, 2025

Choose a reason for hiding this comment

Uh oh!

d33bs May 9, 2025

Choose a reason for hiding this comment

Uh oh!

d33bs May 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!