-
Notifications
You must be signed in to change notification settings - Fork 1
Evaluate model #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Evaluate model #17
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new prediction script to evaluate and predict the annexin V IBP at every timepoint, generating outputs to be used for further analysis (including visualization).
- Added a prediction script that loads both a regular and a shuffled model.
- The script aggregates and processes feature data, applies model predictions, and saves the combined predictions to disk.
Files not reviewed (1)
- 5.bulk_timelapse_model/scripts/4.plot_results.r: Language not supported
df = pd.read_parquet(profile_data_path) | ||
metadata_cols = [cols for cols in df.columns if "Metadata" in cols] | ||
features_cols = [cols for cols in df.columns if "Metadata" not in cols] | ||
features_cols = features_cols |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The assignment to features_cols is redundant. Consider removing it to simplify the code.
features_cols = features_cols |
Copilot uses AI. Check for mistakes.
|
||
metadata_columns = [x for x in aggregate_df.columns if "Metadata_" in x] | ||
shuffled_profile_df = aggregate_df.copy() | ||
for col in shuffled_profile_df.columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider shuffling only the feature columns instead of all columns to avoid potential inconsistencies in the metadata alignment.
for col in shuffled_profile_df.columns: | |
feature_columns = [col for col in shuffled_profile_df.columns if col not in metadata_columns] | |
for col in feature_columns: |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
model_file_dir = pathlib.Path( | ||
"../models/multi_regression_model_ntrees_1000.joblib" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something about seeing these joblib
files made me wonder if it could be helpful to switch to onnx
format sometime and explore the possibilities. You could use sklearn-onnx
to implement the models and perhaps onnxruntime
to run the models. There could be performance and other benefits to making this shift. Hat tip to @MattsonCam, who mentioned ONNX to me a bit ago.
# In[5]: | ||
|
||
|
||
# if the data_split is train and the time is not 12 then set to non_trained_pair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number 12 here could be important to document (i.e., as an outsider I'm not certain why time 12 is significant).
for col in metadata_columns: | ||
predictions_df.insert(0, col, metadata_df[col]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider concating here if it makes sense to avoid the loop.
for col in metadata_columns: | |
predictions_df.insert(0, col, metadata_df[col]) | |
predictions_df = pd.concat([metadata_df[metadata_columns], predictions_df], axis=1) |
This PR evaluates and predicts the annexin V IBP from every timepoint - warning: cool plots will be found beyond this point.