Skip to content

Conversation

@jessicaw9910
Copy link
Collaborator

@jessicaw9910 jessicaw9910 commented Dec 17, 2025

Description

Bugfix and PyMOL update.

Todos

Notable points that this PR has either accomplished or will accomplish.

  • Fix bug in app sequence object
  • Update PyMOL module and CLI
  • Use top 10 mutations to generate stick visualizations
  • Labeling?

Questions

  • How to most efficiently add labeling?

Status

  • Ready to go

jessicaw9910 and others added 30 commits December 3, 2024 17:05
jessicaw9910 and others added 9 commits December 11, 2025 17:23
…er and moved to StructureVisualizerVisualizer
* removed extract_tarfiles from mkt.databases.io_utils - now in schema throughout

* black

* pre-commit

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added requirements.txt

* removed erroneously commited VE package data

* pre-commit

* pre-commit

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change if statment order return_filenotfound_error_if_empty_or_missing; support conditions untar_files_in_memory like list and remove ._ files; make hgnc name key instead of uniprot id

* make hgnc name default key from mkt.databases.kinase_schema instead of uniprot id

* create_tar_without_metadata in io_utils; add tar.gz to generation script - mak sure to use w:gz to tarfile argument; new KinaseInfo.tar.gz with hgnc names as keys

* interim app update

* changed schema tests to reflect hgnc_name keys

* remove file suffix from list_entries

* black

* ordered list

* trailing whitespace

* temporarily added rotation based on ABL1

* print to logging

* wireframe including sequence, structure, and property panels

* altered structure plot size

* starting to add code to combine old CIF with new coords

* added SequenceAlignment adapted from mkt.databases.plot

* commented out SequenceAlignment in generate_alignments - default to using mkt.databases.plot

* added optional flags for SequenceAlignment to allow to repurpose in app

* removed reverse comment since now optional flag

* using mkt.databases.plot version of SequenceAlignment; using PropertyTables

* added resource links, radio button for structure

* could not get plots version to display toolbar on bottom - reimplementing here for now

* broke down SequenceAlignment into smaller plots

* tried to upgrade bokeh to get plot version of SequenceAlignment class to display toolbar

* added typing to serialization function

* starting alignment algorithm function

* add generate_properties to app

* added try_except_return_none_rgetattr to mkt.databases.utils

* make obj_kinase a PropertyTables property and generate extract_properties on instantiation

* added obj_kinases as property, combined sequence generation and plotting into a single class, make the y-axis labels crimson if no sequence of a given type is found in the data

* cleaned up extraneous arguments now included in radio buttons (programming to come), adjusted display_dashboard function for changes in the genererate_ scripts

* bugfixes

* updated for cif inclusion and new schema functionality

* added carryforward flag to codecov

* finalized structural annotations for phosphosites and KLIFS pocket

* removed commented out code no longer in use

* added docstrings for hardcoded resources; added KLIFS annotation to describe stick regions

* incorporating changes trying to switch CIF files to the newly aligned coordinates; will undo with next commit as have introduced an error

* reverting back to old version of KinCore

* fixed ncbi codecov ignore

* upgrade python to <3.13 in pyproject.toml files

* changed flags path structure

* moved constants to a separate file in app

* added docstrings to databsases utils

* removed TODO from hinge:linker

* added KLIFS region labeling to x-axis

* move the no KinCore objects error to the beginning; try to specify full width of KinCore active structure

* pinning the package versions that are working locally

* fixed bug by sorting list_intersect in _generate_highlight_idx before using

* added and commented out conserved residues list for later use

* pymol scripts to generate publication quality image from Streamlit app scripts

* import nf-rnaseq in requirements.txt - no real way to initialize nf-rnaseq package in uniprot otherwise (unless move to another module)

* Databases (#182)

* removed comment

* removed kinase_schema.CollectionKinaseInfo

* comment on PRKD2 and AlphaMissense

* temporary scratch for aligning sequences to DiscoverX

* implemented new class ChEMBLMolecule to query for molecule details

* added xlrd to package dependencies to process Davis dataset

* preliminary info for davis harmonization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* make rdkit a package dependency

* cli for querying ChEMBL for dataset preprocessing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* moved davis and pkis2 modules to datasets

* changed error message for maybe_get_symbol_from_hgnc_search if custom_field provided

* updates to pkis2 and davis datasets modules

* removed commented out PR CIs for databases and schema

* fixed chembl search error - default empty list not None

* added adjudicate_kd_start and adjudicate_kd_end for dataset incorporation purposes

* added docstring for bool_offset

* allow for str_fasta to be used if need to hardcode for errors

* removed pytest.mark.skip as NCBI API is currently running

* added function to check if lipid kinase

* specified input_is_hgnc_symbol default in docstring

* added Pfam docstring

* UniProtRefSeqProteinGET and query_uniprotbulk_api to uniprot module; modifies nf-rnaseq package tooling

* fully working initial commit of discoverx module; construct to KD/KLIFS mapping outstanding

* added verbose flag to the KinaseInfo functions rather than logging by default

* added verbose flags

* added and commented out pip install nf-rnaseq from github; uncomment for testing if in use

* import only UniProtFASTA rather than entire uniprot module to avoid nf-rnaseq import errors; fix if want to test this functionality

* uncommented nf-rnaseq

* in progress datasets commit

* used verbose flag for caplog tests

* dict_refseq_indices working correctly

* dict_construct_sequences finalized - use this to generate harmonized representations

* generate the dataset csv files

* process now contains all code necssary to generate different aligned input sequences

* conformed to latest process module structure

* added dataset csv CLI to pyproject.toml

* added plotting functions for discoverx

* upgrades for discoverx plotting

* CLI script to generate poster dataset plots

* plot both svg and PNG formats for all

* added plot dynamic range to the plotting CLI, need to fix font size

* fixed svg in plot_dynamic_range - font still looks a little off; added docstrings and fixed comment format

* Fix test_pfam and test_ncbi to handle API 500 errors gracefully

Handle RetryError exceptions when external APIs return 500 errors by
skipping tests instead of failing. This prevents CI failures due to
unpredictable external API availability.

Changes:
- Wrap test_pfam API calls in try-except block
- Wrap test_ncbi API calls in try-except block
- Skip tests with informative messages when 500 errors occur
- Re-raise other exceptions to catch real issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Refactor plotting code and fix SVG font rendering issues

This commit improves the plotting functionality by:
1. Creating a reusable save_plot() helper function to reduce code duplication
2. Fixing SVG font rendering issues by converting text to paths
3. Improving mathtext rendering for subscripts (K_d, log_10)

Changes:
- Add save_plot() function to handle saving both SVG and PNG formats
- Replace repetitive save code in all 5 plotting functions
- Change svg.fonttype from "none" to "path" for consistent rendering
- Update mathtext from \mathregular to \mathrm for proper subscript rendering
- Ensure plots render consistently in browsers, VS Code, and vector editors

Benefits:
- SVG files now render perfectly in all viewers without spacing/kerning issues
- Reduced code duplication by ~60 lines
- Easier maintenance with centralized save logic
- Consistent behavior across all plotting functions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* absolute filepath for cwd instead of '.'

* fixed KinaseMissenseMutations.dict_replace - only do this if key in original datast

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude <[email protected]>

* Databases (#183)

* removed comment

* removed kinase_schema.CollectionKinaseInfo

* comment on PRKD2 and AlphaMissense

* temporary scratch for aligning sequences to DiscoverX

* implemented new class ChEMBLMolecule to query for molecule details

* added xlrd to package dependencies to process Davis dataset

* preliminary info for davis harmonization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* make rdkit a package dependency

* cli for querying ChEMBL for dataset preprocessing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* moved davis and pkis2 modules to datasets

* changed error message for maybe_get_symbol_from_hgnc_search if custom_field provided

* updates to pkis2 and davis datasets modules

* removed commented out PR CIs for databases and schema

* fixed chembl search error - default empty list not None

* added adjudicate_kd_start and adjudicate_kd_end for dataset incorporation purposes

* added docstring for bool_offset

* allow for str_fasta to be used if need to hardcode for errors

* removed pytest.mark.skip as NCBI API is currently running

* added function to check if lipid kinase

* specified input_is_hgnc_symbol default in docstring

* added Pfam docstring

* UniProtRefSeqProteinGET and query_uniprotbulk_api to uniprot module; modifies nf-rnaseq package tooling

* fully working initial commit of discoverx module; construct to KD/KLIFS mapping outstanding

* added verbose flag to the KinaseInfo functions rather than logging by default

* added verbose flags

* added and commented out pip install nf-rnaseq from github; uncomment for testing if in use

* import only UniProtFASTA rather than entire uniprot module to avoid nf-rnaseq import errors; fix if want to test this functionality

* uncommented nf-rnaseq

* in progress datasets commit

* used verbose flag for caplog tests

* dict_refseq_indices working correctly

* dict_construct_sequences finalized - use this to generate harmonized representations

* generate the dataset csv files

* process now contains all code necssary to generate different aligned input sequences

* conformed to latest process module structure

* added dataset csv CLI to pyproject.toml

* added plotting functions for discoverx

* upgrades for discoverx plotting

* CLI script to generate poster dataset plots

* plot both svg and PNG formats for all

* added plot dynamic range to the plotting CLI, need to fix font size

* fixed svg in plot_dynamic_range - font still looks a little off; added docstrings and fixed comment format

* Fix test_pfam and test_ncbi to handle API 500 errors gracefully

Handle RetryError exceptions when external APIs return 500 errors by
skipping tests instead of failing. This prevents CI failures due to
unpredictable external API availability.

Changes:
- Wrap test_pfam API calls in try-except block
- Wrap test_ncbi API calls in try-except block
- Skip tests with informative messages when 500 errors occur
- Re-raise other exceptions to catch real issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Refactor plotting code and fix SVG font rendering issues

This commit improves the plotting functionality by:
1. Creating a reusable save_plot() helper function to reduce code duplication
2. Fixing SVG font rendering issues by converting text to paths
3. Improving mathtext rendering for subscripts (K_d, log_10)

Changes:
- Add save_plot() function to handle saving both SVG and PNG formats
- Replace repetitive save code in all 5 plotting functions
- Change svg.fonttype from "none" to "path" for consistent rendering
- Update mathtext from \mathregular to \mathrm for proper subscript rendering
- Ensure plots render consistently in browsers, VS Code, and vector editors

Benefits:
- SVG files now render perfectly in all viewers without spacing/kerning issues
- Reduced code duplication by ~60 lines
- Easier maintenance with centralized save logic
- Consistent behavior across all plotting functions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* absolute filepath for cwd instead of '.'

* fixed KinaseMissenseMutations.dict_replace - only do this if key in original datast

* make the checks and post_init optional in case loading from a CSV file for a cohort that requires a VPN - logger errors are now warnings; allow load_from_csv from an input str if loading from multiple dataframes (e.g., KinaseMissenseMutations ._df and ._df_filter); added pathfile_filter to KinaseMissenseMutations

* updated databases for kw_only arg study_id in Mutations

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude <[email protected]>

* Databases (#184)

* removed comment

* removed kinase_schema.CollectionKinaseInfo

* comment on PRKD2 and AlphaMissense

* temporary scratch for aligning sequences to DiscoverX

* implemented new class ChEMBLMolecule to query for molecule details

* added xlrd to package dependencies to process Davis dataset

* preliminary info for davis harmonization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* make rdkit a package dependency

* cli for querying ChEMBL for dataset preprocessing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* moved davis and pkis2 modules to datasets

* changed error message for maybe_get_symbol_from_hgnc_search if custom_field provided

* updates to pkis2 and davis datasets modules

* removed commented out PR CIs for databases and schema

* fixed chembl search error - default empty list not None

* added adjudicate_kd_start and adjudicate_kd_end for dataset incorporation purposes

* added docstring for bool_offset

* allow for str_fasta to be used if need to hardcode for errors

* removed pytest.mark.skip as NCBI API is currently running

* added function to check if lipid kinase

* specified input_is_hgnc_symbol default in docstring

* added Pfam docstring

* UniProtRefSeqProteinGET and query_uniprotbulk_api to uniprot module; modifies nf-rnaseq package tooling

* fully working initial commit of discoverx module; construct to KD/KLIFS mapping outstanding

* added verbose flag to the KinaseInfo functions rather than logging by default

* added verbose flags

* added and commented out pip install nf-rnaseq from github; uncomment for testing if in use

* import only UniProtFASTA rather than entire uniprot module to avoid nf-rnaseq import errors; fix if want to test this functionality

* uncommented nf-rnaseq

* in progress datasets commit

* used verbose flag for caplog tests

* dict_refseq_indices working correctly

* dict_construct_sequences finalized - use this to generate harmonized representations

* generate the dataset csv files

* process now contains all code necssary to generate different aligned input sequences

* conformed to latest process module structure

* added dataset csv CLI to pyproject.toml

* added plotting functions for discoverx

* upgrades for discoverx plotting

* CLI script to generate poster dataset plots

* plot both svg and PNG formats for all

* added plot dynamic range to the plotting CLI, need to fix font size

* fixed svg in plot_dynamic_range - font still looks a little off; added docstrings and fixed comment format

* Fix test_pfam and test_ncbi to handle API 500 errors gracefully

Handle RetryError exceptions when external APIs return 500 errors by
skipping tests instead of failing. This prevents CI failures due to
unpredictable external API availability.

Changes:
- Wrap test_pfam API calls in try-except block
- Wrap test_ncbi API calls in try-except block
- Skip tests with informative messages when 500 errors occur
- Re-raise other exceptions to catch real issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Refactor plotting code and fix SVG font rendering issues

This commit improves the plotting functionality by:
1. Creating a reusable save_plot() helper function to reduce code duplication
2. Fixing SVG font rendering issues by converting text to paths
3. Improving mathtext rendering for subscripts (K_d, log_10)

Changes:
- Add save_plot() function to handle saving both SVG and PNG formats
- Replace repetitive save code in all 5 plotting functions
- Change svg.fonttype from "none" to "path" for consistent rendering
- Update mathtext from \mathregular to \mathrm for proper subscript rendering
- Ensure plots render consistently in browsers, VS Code, and vector editors

Benefits:
- SVG files now render perfectly in all viewers without spacing/kerning issues
- Reduced code duplication by ~60 lines
- Easier maintenance with centralized save logic
- Consistent behavior across all plotting functions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* absolute filepath for cwd instead of '.'

* fixed KinaseMissenseMutations.dict_replace - only do this if key in original datast

* make the checks and post_init optional in case loading from a CSV file for a cohort that requires a VPN - logger errors are now warnings; allow load_from_csv from an input str if loading from multiple dataframes (e.g., KinaseMissenseMutations ._df and ._df_filter); added pathfile_filter to KinaseMissenseMutations

* updated databases for kw_only arg study_id in Mutations

* fixed bug in dict_kinase_cbio in get_kinase_missense_mutations function - need to check if mkt_name is in dict_kinase_cbio rather than cbio_name

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude <[email protected]>

* Databases (#185)

* removed comment

* removed kinase_schema.CollectionKinaseInfo

* comment on PRKD2 and AlphaMissense

* temporary scratch for aligning sequences to DiscoverX

* implemented new class ChEMBLMolecule to query for molecule details

* added xlrd to package dependencies to process Davis dataset

* preliminary info for davis harmonization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* make rdkit a package dependency

* cli for querying ChEMBL for dataset preprocessing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* moved davis and pkis2 modules to datasets

* changed error message for maybe_get_symbol_from_hgnc_search if custom_field provided

* updates to pkis2 and davis datasets modules

* removed commented out PR CIs for databases and schema

* fixed chembl search error - default empty list not None

* added adjudicate_kd_start and adjudicate_kd_end for dataset incorporation purposes

* added docstring for bool_offset

* allow for str_fasta to be used if need to hardcode for errors

* removed pytest.mark.skip as NCBI API is currently running

* added function to check if lipid kinase

* specified input_is_hgnc_symbol default in docstring

* added Pfam docstring

* UniProtRefSeqProteinGET and query_uniprotbulk_api to uniprot module; modifies nf-rnaseq package tooling

* fully working initial commit of discoverx module; construct to KD/KLIFS mapping outstanding

* added verbose flag to the KinaseInfo functions rather than logging by default

* added verbose flags

* added and commented out pip install nf-rnaseq from github; uncomment for testing if in use

* import only UniProtFASTA rather than entire uniprot module to avoid nf-rnaseq import errors; fix if want to test this functionality

* uncommented nf-rnaseq

* in progress datasets commit

* used verbose flag for caplog tests

* dict_refseq_indices working correctly

* dict_construct_sequences finalized - use this to generate harmonized representations

* generate the dataset csv files

* process now contains all code necssary to generate different aligned input sequences

* conformed to latest process module structure

* added dataset csv CLI to pyproject.toml

* added plotting functions for discoverx

* upgrades for discoverx plotting

* CLI script to generate poster dataset plots

* plot both svg and PNG formats for all

* added plot dynamic range to the plotting CLI, need to fix font size

* fixed svg in plot_dynamic_range - font still looks a little off; added docstrings and fixed comment format

* Fix test_pfam and test_ncbi to handle API 500 errors gracefully

Handle RetryError exceptions when external APIs return 500 errors by
skipping tests instead of failing. This prevents CI failures due to
unpredictable external API availability.

Changes:
- Wrap test_pfam API calls in try-except block
- Wrap test_ncbi API calls in try-except block
- Skip tests with informative messages when 500 errors occur
- Re-raise other exceptions to catch real issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Refactor plotting code and fix SVG font rendering issues

This commit improves the plotting functionality by:
1. Creating a reusable save_plot() helper function to reduce code duplication
2. Fixing SVG font rendering issues by converting text to paths
3. Improving mathtext rendering for subscripts (K_d, log_10)

Changes:
- Add save_plot() function to handle saving both SVG and PNG formats
- Replace repetitive save code in all 5 plotting functions
- Change svg.fonttype from "none" to "path" for consistent rendering
- Update mathtext from \mathregular to \mathrm for proper subscript rendering
- Ensure plots render consistently in browsers, VS Code, and vector editors

Benefits:
- SVG files now render perfectly in all viewers without spacing/kerning issues
- Reduced code duplication by ~60 lines
- Easier maintenance with centralized save logic
- Consistent behavior across all plotting functions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* absolute filepath for cwd instead of '.'

* fixed KinaseMissenseMutations.dict_replace - only do this if key in original datast

* make the checks and post_init optional in case loading from a CSV file for a cohort that requires a VPN - logger errors are now warnings; allow load_from_csv from an input str if loading from multiple dataframes (e.g., KinaseMissenseMutations ._df and ._df_filter); added pathfile_filter to KinaseMissenseMutations

* updated databases for kw_only arg study_id in Mutations

* fixed bug in dict_kinase_cbio in get_kinase_missense_mutations function - need to check if mkt_name is in dict_kinase_cbio rather than cbio_name

* changed HGNC name and mismatch error logging

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude <[email protected]>

* Two minor logger formatting tweaks (#186)

Linebreaks and spacing for canonical mismatch errors

* Databases (#187)

* removed comment

* removed kinase_schema.CollectionKinaseInfo

* comment on PRKD2 and AlphaMissense

* temporary scratch for aligning sequences to DiscoverX

* implemented new class ChEMBLMolecule to query for molecule details

* added xlrd to package dependencies to process Davis dataset

* preliminary info for davis harmonization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* make rdkit a package dependency

* cli for querying ChEMBL for dataset preprocessing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* moved davis and pkis2 modules to datasets

* changed error message for maybe_get_symbol_from_hgnc_search if custom_field provided

* updates to pkis2 and davis datasets modules

* removed commented out PR CIs for databases and schema

* fixed chembl search error - default empty list not None

* added adjudicate_kd_start and adjudicate_kd_end for dataset incorporation purposes

* added docstring for bool_offset

* allow for str_fasta to be used if need to hardcode for errors

* removed pytest.mark.skip as NCBI API is currently running

* added function to check if lipid kinase

* specified input_is_hgnc_symbol default in docstring

* added Pfam docstring

* UniProtRefSeqProteinGET and query_uniprotbulk_api to uniprot module; modifies nf-rnaseq package tooling

* fully working initial commit of discoverx module; construct to KD/KLIFS mapping outstanding

* added verbose flag to the KinaseInfo functions rather than logging by default

* added verbose flags

* added and commented out pip install nf-rnaseq from github; uncomment for testing if in use

* import only UniProtFASTA rather than entire uniprot module to avoid nf-rnaseq import errors; fix if want to test this functionality

* uncommented nf-rnaseq

* in progress datasets commit

* used verbose flag for caplog tests

* dict_refseq_indices working correctly

* dict_construct_sequences finalized - use this to generate harmonized representations

* generate the dataset csv files

* process now contains all code necssary to generate different aligned input sequences

* conformed to latest process module structure

* added dataset csv CLI to pyproject.toml

* added plotting functions for discoverx

* upgrades for discoverx plotting

* CLI script to generate poster dataset plots

* plot both svg and PNG formats for all

* added plot dynamic range to the plotting CLI, need to fix font size

* fixed svg in plot_dynamic_range - font still looks a little off; added docstrings and fixed comment format

* Fix test_pfam and test_ncbi to handle API 500 errors gracefully

Handle RetryError exceptions when external APIs return 500 errors by
skipping tests instead of failing. This prevents CI failures due to
unpredictable external API availability.

Changes:
- Wrap test_pfam API calls in try-except block
- Wrap test_ncbi API calls in try-except block
- Skip tests with informative messages when 500 errors occur
- Re-raise other exceptions to catch real issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Refactor plotting code and fix SVG font rendering issues

This commit improves the plotting functionality by:
1. Creating a reusable save_plot() helper function to reduce code duplication
2. Fixing SVG font rendering issues by converting text to paths
3. Improving mathtext rendering for subscripts (K_d, log_10)

Changes:
- Add save_plot() function to handle saving both SVG and PNG formats
- Replace repetitive save code in all 5 plotting functions
- Change svg.fonttype from "none" to "path" for consistent rendering
- Update mathtext from \mathregular to \mathrm for proper subscript rendering
- Ensure plots render consistently in browsers, VS Code, and vector editors

Benefits:
- SVG files now render perfectly in all viewers without spacing/kerning issues
- Reduced code duplication by ~60 lines
- Easier maintenance with centralized save logic
- Consistent behavior across all plotting functions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* absolute filepath for cwd instead of '.'

* fixed KinaseMissenseMutations.dict_replace - only do this if key in original datast

* make the checks and post_init optional in case loading from a CSV file for a cohort that requires a VPN - logger errors are now warnings; allow load_from_csv from an input str if loading from multiple dataframes (e.g., KinaseMissenseMutations ._df and ._df_filter); added pathfile_filter to KinaseMissenseMutations

* updated databases for kw_only arg study_id in Mutations

* fixed bug in dict_kinase_cbio in get_kinase_missense_mutations function - need to check if mkt_name is in dict_kinase_cbio rather than cbio_name

* changed HGNC name and mismatch error logging

* Two minor logger formatting tweaks (#186)

Linebreaks and spacing for canonical mismatch errors

* only log query errors if present

* moved classes from app to mkt.databases.app since need to use extensibly in other places (mkt_impact); simplified names for relevant app modules since no longer scripts importing locally; remove py3dmol and streamlit/bokeh related plotting functions to standalone visualization script in app; created pymol module and moved CLI script to mkt.databases; added webcolors to pyproject.toml dependencies

* removed all plotting - keep this in standalone app

* removed all plotting - keep this in standalone app

* updated imports for new app structure

* changed imports in app script

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude <[email protected]>

* Databases (#188)

* removed comment

* removed kinase_schema.CollectionKinaseInfo

* comment on PRKD2 and AlphaMissense

* temporary scratch for aligning sequences to DiscoverX

* implemented new class ChEMBLMolecule to query for molecule details

* added xlrd to package dependencies to process Davis dataset

* preliminary info for davis harmonization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* add check_molecules to ChEMBL; updated wrong ChEMBLMolecule argument

* make rdkit a package dependency

* cli for querying ChEMBL for dataset preprocessing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* moved davis and pkis2 modules to datasets

* changed error message for maybe_get_symbol_from_hgnc_search if custom_field provided

* updates to pkis2 and davis datasets modules

* removed commented out PR CIs for databases and schema

* fixed chembl search error - default empty list not None

* added adjudicate_kd_start and adjudicate_kd_end for dataset incorporation purposes

* added docstring for bool_offset

* allow for str_fasta to be used if need to hardcode for errors

* removed pytest.mark.skip as NCBI API is currently running

* added function to check if lipid kinase

* specified input_is_hgnc_symbol default in docstring

* added Pfam docstring

* UniProtRefSeqProteinGET and query_uniprotbulk_api to uniprot module; modifies nf-rnaseq package tooling

* fully working initial commit of discoverx module; construct to KD/KLIFS mapping outstanding

* added verbose flag to the KinaseInfo functions rather than logging by default

* added verbose flags

* added and commented out pip install nf-rnaseq from github; uncomment for testing if in use

* import only UniProtFASTA rather than entire uniprot module to avoid nf-rnaseq import errors; fix if want to test this functionality

* uncommented nf-rnaseq

* in progress datasets commit

* used verbose flag for caplog tests

* dict_refseq_indices working correctly

* dict_construct_sequences finalized - use this to generate harmonized representations

* generate the dataset csv files

* process now contains all code necssary to generate different aligned input sequences

* conformed to latest process module structure

* added dataset csv CLI to pyproject.toml

* added plotting functions for discoverx

* upgrades for discoverx plotting

* CLI script to generate poster dataset plots

* plot both svg and PNG formats for all

* added plot dynamic range to the plotting CLI, need to fix font size

* fixed svg in plot_dynamic_range - font still looks a little off; added docstrings and fixed comment format

* Fix test_pfam and test_ncbi to handle API 500 errors gracefully

Handle RetryError exceptions when external APIs return 500 errors by
skipping tests instead of failing. This prevents CI failures due to
unpredictable external API availability.

Changes:
- Wrap test_pfam API calls in try-except block
- Wrap test_ncbi API calls in try-except block
- Skip tests with informative messages when 500 errors occur
- Re-raise other exceptions to catch real issues

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Refactor plotting code and fix SVG font rendering issues

This commit improves the plotting functionality by:
1. Creating a reusable save_plot() helper function to reduce code duplication
2. Fixing SVG font rendering issues by converting text to paths
3. Improving mathtext rendering for subscripts (K_d, log_10)

Changes:
- Add save_plot() function to handle saving both SVG and PNG formats
- Replace repetitive save code in all 5 plotting functions
- Change svg.fonttype from "none" to "path" for consistent rendering
- Update mathtext from \mathregular to \mathrm for proper subscript rendering
- Ensure plots render consistently in browsers, VS Code, and vector editors

Benefits:
- SVG files now render perfectly in all viewers without spacing/kerning issues
- Reduced code duplication by ~60 lines
- Easier maintenance with centralized save logic
- Consistent behavior across all plotting functions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* absolute filepath for cwd instead of '.'

* fixed KinaseMissenseMutations.dict_replace - only do this if key in original datast

* make the checks and post_init optional in case loading from a CSV file for a cohort that requires a VPN - logger errors are now warnings; allow load_from_csv from an input str if loading from multiple dataframes (e.g., KinaseMissenseMutations ._df and ._df_filter); added pathfile_filter to KinaseMissenseMutations

* updated databases for kw_only arg study_id in Mutations

* fixed bug in dict_kinase_cbio in get_kinase_missense_mutations function - need to check if mkt_name is in dict_kinase_cbio rather than cbio_name

* changed HGNC name and mismatch error logging

* Two minor logger formatting tweaks (#186)

Linebreaks and spacing for canonical mismatch errors

* only log query errors if present

* moved classes from app to mkt.databases.app since need to use extensibly in other places (mkt_impact); simplified names for relevant app modules since no longer scripts importing locally; remove py3dmol and streamlit/bokeh related plotting functions to standalone visualization script in app; created pymol module and moved CLI script to mkt.databases; added webcolors to pyproject.toml dependencies

* removed all plotting - keep this in standalone app

* removed all plotting - keep this in standalone app

* updated imports for new app structure

* changed imports in app script

* PyMOL module and CLI

* removed self.html = self.visualize_structure() from StructureVisualizer and moved to StructureVisualizerVisualizer

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude <[email protected]>

* use StructureVisualizerVisualizer rather than StructureVisualizer

* renamed Visualizers as Generators to prevent clunky nomenclature for StructureVisualizer

* fixed __post_init__ error in StructureVisualizerGenerator

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claude <[email protected]>
…ilepath_dict functions; added generate_instructions to create an instructions txt file in addition to logging intstructions
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 1.58730% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.02%. Comparing base (89cb98b) to head (dc56955).

Additional details and impacted files
Flag Coverage Δ *Carryforward flag
databases 45.52% <1.58%> (-0.26%) ⬇️
schema 85.71% <ø> (ø) Carriedforward from d3ffdc6

*This pull request uses carry forward flags. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…tations - need to implement; changed unused variables in list comprehension to _
…t in CLI rather than structure and sequence classes
…rative in py3Dmol viewer not pymol pdb generator
…urated residues as sticks; piecewise interpolate colors for mutations; add stick options for top 10 mutations (allowing for ties)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants