Skip to content

Param tuning code integration: pca chosen #209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

ntalluri
Copy link
Collaborator

@ntalluri ntalluri commented Mar 3, 2025

No description provided.

@@ -142,8 +142,14 @@ def pca(dataframe: pd.DataFrame, output_png: str, output_var: str, output_coord:
if not isinstance(labels, bool):
raise ValueError(f"labels={labels} must be True or False")

scaler = StandardScaler()
#TODO: MinMaxScaler changes nothing about the data
# scaler = MinMaxScaler()
Copy link
Collaborator Author

@ntalluri ntalluri Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to do PCA on Binary Data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@ntalluri ntalluri Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be best to keep these features are one hot encoded values (which they already are).

@ntalluri
Copy link
Collaborator Author

@agitter Do this PR Last

Will need to merge with updated master after #193, #207 is merged, and #208. (hopefully this will remove the repeated files through out the PRs)

  • there will be merge conflicts with the Snakefile, evaluation.py and the test suite.

Included in this PR:

  1. update to evaluation.py that updates the code to include precision_and_recall and pca chosen pathway (precision_and_recall is also used in Param tuning code integration:no param tuning #208)
  2. a new test suite evaluate for pca chosen pathway
  3. updates to Snakemake file that will run evaluation per dataset and per algortihm-dataset pair for pca chosen pathway
  4. update to ml code for PCA to add centroids
  5. still need to figure out how to rescale the binary data
  6. update to ml test suite for expected pca coordinates and ml test code

Copy link
Collaborator Author

@ntalluri ntalluri Apr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the create palette function, update the code to have a sorted list of unique column headers

unique_column_names = list(sorted(set(column_names)))
custom_palette = sns.color_palette(palette = "tab20c", n_colors = len(unique_column_names))
label_color_map = {label: color for label, color in zip(unique_column_names, custom_palette, strict=True)}
return label_color_map

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant