Modeling #167

jessicaw9910 · 2025-08-20T14:51:22Z

Description

Provide a brief description of the PR's purpose here.

Todos

Notable points that this PR has either accomplished or will accomplish.

TODO 1

Questions

Question1

Status

Ready to go

…of cif, used load_raw instead of load for cif files

for more information, see https://pre-commit.ci

…ase-toolkit into modeling

for more information, see https://pre-commit.ci

* initial implementation * added ChEMBL testing * added importlib-resources to dependencies * pinned importlib-resources version * add importlib-resources to devtools env file and removed from pyproject.toml * support for MoleculeSearch, MoleculeExact, and MoleculePreferred * add verbose flagging to ChEMBL; return_chembl_id to make queries hierarchically * docstrings for existing API clients; initial implementation of GraphQLClient * opentargets module initial implementation * test for open_targets * updated tests with Search, Exact, and Preferred for ChEMBL * notebooks for diffusion modeling group

* initial implementation * added ChEMBL testing * added importlib-resources to dependencies * pinned importlib-resources version * add importlib-resources to devtools env file and removed from pyproject.toml * support for MoleculeSearch, MoleculeExact, and MoleculePreferred * add verbose flagging to ChEMBL; return_chembl_id to make queries hierarchically * docstrings for existing API clients; initial implementation of GraphQLClient * opentargets module initial implementation * test for open_targets * updated tests with Search, Exact, and Preferred for ChEMBL * notebooks for diffusion modeling group * renamed dir

* changed schema namespace and included json files with package rather than in data directory * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 fixes in io_utils * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated schema_demo notebook for sub-package changes * updated schema demo notebook for latest changes * updated KinaseInfo json files with new field names * revised schema notebook for latest changes * now using dict to limit the possible serialization/deserialization functions; support for json, yaml, toml * using ConfigDict(use_enum_values=True) for Enum to enable serialization; default values for fields where None allowed to None; validating KLIFS2UniProt dicts to default to None if missing given toml doesn't save None entries * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unmarked files as executable * removed unused Callable import for flake7 compatibility pull to merge * removed unused Optional import for flake8 compatibility pull to merge * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added flags to codecov.yml * added schema dependencies to test_env.yaml * schema tests * added CI yaml for schema sub-package * updated schema notebook for increased standardization and typos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 modifications * added -e flag to pi p install - hope to resolve 'No package metadata was found' error * broke out dependencies specific to mkt.schema separately * use schema_test_env.yaml instead to try to resolve mkt-schema' requires a different Python: 3.13.2 not in '<3.12,>=3.9' * removed sub-package specific environments * conformed env and ci files to match asap * added tqdm to env and toml for schema * fixed kinhub.name to kinhub.kinase_name * added encoding='utf-8' to serialization function for Windows compatibility with TOML * added carryforwards to flags * utf-8 encoding causing Windows CI to lag - will not support Windows TOML formatting as a result * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * switched logging err or to info for Windows and TOML serialization * updated schema demo notebook for latest changes merge to pull * moved constants to separate file * added indent=4 for jsons in package * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added Colab instructions * updated documentation * namespace changed to mkt.databases; updated toml and removed poetry; added LICENSE and MANIFEST.in; added KinoML to acknowledgements * >= packages rather than pinning exact versions; added gitpython * changed to reflect namespace update * updated for namespace changes; for kinase_schema, also changed field names to reflect changes in mkt.schema package * updated notebooks for databases namespace changes * added channels to correct yaml formatting error when installing * added mkt.schema installation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused import * updated databases ci for namespace changes * removed init file from databases tests * added HTTPError exception handling for KLIFS and updated tests to reflect; separated KLIFS and KinCore tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * re-integrated kincore test into klifs - need it for KD indices * not testing NCBI; 404 error currently * commenting out carryforward for the moment * removed reference to Poetry in getting started docs * updated path changes and package description * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed executable mark from notebooks/schema_demo.ipynb * fixed schema-ci badge and typo * codecov ignoring ncbi in mkt.databases and skipping corresponding test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed git dependency from mkt.schema pyproject.toml * added panel wildcard to prevent from using 1.6 which requires python3.13 * panel == instead of >= * made mkt.databases a dependency of mkt.schema; databases now has a KinaseInfoGenerator that inherits from KinaseInfo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused imports * pre-commit hook * pre-commit hook fix * updated notebooks for UniProtFASTA * removed extraneous comments and added APIQuery superclass * fixed get_repo_root import statement * changed get_repo_root reference to io_utils in KinCore; updated uniprot for UniProtFASTA in tests * 404 not throwing an exception - added a dict_kinase_info = None option in if statement to accomodate * exception handling if no FASTA file downloaded * commented out line in pkis2 - TODO: further updates * fixed cache error * added UniProtFASTA vs. UniProtJSON (in progress) * added instructions for loading in colab * updated databases notebook to incorporate all recent changes * added functions to extract sequence from kincore cif and an opinionated method to adjudicate kinase domain sequence (kincore cif > kincore fasta > Pfam > None) * corrected Pfam > pfam, if self.kincore.cf is None return None * use extract_sequence_from_cif rather than manual * added testing for new adjudicate functions * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils and rgetattr to mkt.schema.utils * moved get_repo_root to mkt.schema.io_utils and removed modeling get_repo_root * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * added git to requirements * rgetattr from schema instead of databases * rgetattr from schema instead of databases * rgetattr from schema instead of databases * rgetattr from schema instead of databases * removed rgetattr, rsetattr, and try_except_return_none_rgetattr from mkt.databases.utils * removed rgetattr and random_uuid from mkt.ml.utils * TestSchema.test_utils, changed serialization to serde, altered dictionary import * added mkt.schema.utils --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jessicawhite <[email protected]>

* changed schema namespace and included json files with package rather than in data directory * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 fixes in io_utils * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated schema_demo notebook for sub-package changes * updated schema demo notebook for latest changes * updated KinaseInfo json files with new field names * revised schema notebook for latest changes * now using dict to limit the possible serialization/deserialization functions; support for json, yaml, toml * using ConfigDict(use_enum_values=True) for Enum to enable serialization; default values for fields where None allowed to None; validating KLIFS2UniProt dicts to default to None if missing given toml doesn't save None entries * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unmarked files as executable * removed unused Callable import for flake7 compatibility pull to merge * removed unused Optional import for flake8 compatibility pull to merge * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added flags to codecov.yml * added schema dependencies to test_env.yaml * schema tests * added CI yaml for schema sub-package * updated schema notebook for increased standardization and typos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 modifications * added -e flag to pi p install - hope to resolve 'No package metadata was found' error * broke out dependencies specific to mkt.schema separately * use schema_test_env.yaml instead to try to resolve mkt-schema' requires a different Python: 3.13.2 not in '<3.12,>=3.9' * removed sub-package specific environments * conformed env and ci files to match asap * added tqdm to env and toml for schema * fixed kinhub.name to kinhub.kinase_name * added encoding='utf-8' to serialization function for Windows compatibility with TOML * added carryforwards to flags * utf-8 encoding causing Windows CI to lag - will not support Windows TOML formatting as a result * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * switched logging err or to info for Windows and TOML serialization * updated schema demo notebook for latest changes merge to pull * moved constants to separate file * added indent=4 for jsons in package * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added Colab instructions * updated documentation * namespace changed to mkt.databases; updated toml and removed poetry; added LICENSE and MANIFEST.in; added KinoML to acknowledgements * >= packages rather than pinning exact versions; added gitpython * changed to reflect namespace update * updated for namespace changes; for kinase_schema, also changed field names to reflect changes in mkt.schema package * updated notebooks for databases namespace changes * added channels to correct yaml formatting error when installing * added mkt.schema installation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused import * updated databases ci for namespace changes * removed init file from databases tests * added HTTPError exception handling for KLIFS and updated tests to reflect; separated KLIFS and KinCore tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * re-integrated kincore test into klifs - need it for KD indices * not testing NCBI; 404 error currently * commenting out carryforward for the moment * removed reference to Poetry in getting started docs * updated path changes and package description * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed executable mark from notebooks/schema_demo.ipynb * fixed schema-ci badge and typo * codecov ignoring ncbi in mkt.databases and skipping corresponding test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed git dependency from mkt.schema pyproject.toml * added panel wildcard to prevent from using 1.6 which requires python3.13 * panel == instead of >= * made mkt.databases a dependency of mkt.schema; databases now has a KinaseInfoGenerator that inherits from KinaseInfo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused imports * pre-commit hook * pre-commit hook fix * updated notebooks for UniProtFASTA * removed extraneous comments and added APIQuery superclass * fixed get_repo_root import statement * changed get_repo_root reference to io_utils in KinCore; updated uniprot for UniProtFASTA in tests * 404 not throwing an exception - added a dict_kinase_info = None option in if statement to accomodate * exception handling if no FASTA file downloaded * commented out line in pkis2 - TODO: further updates * fixed cache error * added UniProtFASTA vs. UniProtJSON (in progress) * added instructions for loading in colab * updated databases notebook to incorporate all recent changes * added functions to extract sequence from kincore cif and an opinionated method to adjudicate kinase domain sequence (kincore cif > kincore fasta > Pfam > None) * corrected Pfam > pfam, if self.kincore.cf is None return None * use extract_sequence_from_cif rather than manual * added testing for new adjudicate functions * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils and rgetattr to mkt.schema.utils * moved get_repo_root to mkt.schema.io_utils and removed modeling get_repo_root * moved get_repo_root to mkt.schema.io_utils * moved get_repo_root to mkt.schema.io_utils * added git to requirements * rgetattr from schema instead of databases * rgetattr from schema instead of databases * rgetattr from schema instead of databases * rgetattr from schema instead of databases * removed rgetattr, rsetattr, and try_except_return_none_rgetattr from mkt.databases.utils * removed rgetattr and random_uuid from mkt.ml.utils * TestSchema.test_utils, changed serialization to serde, altered dictionary import * added mkt.schema.utils * moved gitpython out of mkt.databases and into mkt.schema --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jessicawhite <[email protected]>

* removed extract_tarfiles from mkt.databases.io_utils - now in schema throughout * black * pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added requirements.txt * removed erroneously commited VE package data * pre-commit * pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change if statment order return_filenotfound_error_if_empty_or_missing; support conditions untar_files_in_memory like list and remove ._ files; make hgnc name key instead of uniprot id * make hgnc name default key from mkt.databases.kinase_schema instead of uniprot id * create_tar_without_metadata in io_utils; add tar.gz to generation script - mak sure to use w:gz to tarfile argument; new KinaseInfo.tar.gz with hgnc names as keys * interim app update * changed schema tests to reflect hgnc_name keys * remove file suffix from list_entries * black * ordered list * trailing whitespace * temporarily added rotation based on ABL1 * print to logging * wireframe including sequence, structure, and property panels * altered structure plot size * starting to add code to combine old CIF with new coords * added SequenceAlignment adapted from mkt.databases.plot * commented out SequenceAlignment in generate_alignments - default to using mkt.databases.plot * added optional flags for SequenceAlignment to allow to repurpose in app * removed reverse comment since now optional flag * using mkt.databases.plot version of SequenceAlignment; using PropertyTables * added resource links, radio button for structure * could not get plots version to display toolbar on bottom - reimplementing here for now * broke down SequenceAlignment into smaller plots * tried to upgrade bokeh to get plot version of SequenceAlignment class to display toolbar * added typing to serialization function * starting alignment algorithm function * add generate_properties to app * added try_except_return_none_rgetattr to mkt.databases.utils * make obj_kinase a PropertyTables property and generate extract_properties on instantiation * added obj_kinases as property, combined sequence generation and plotting into a single class, make the y-axis labels crimson if no sequence of a given type is found in the data * cleaned up extraneous arguments now included in radio buttons (programming to come), adjusted display_dashboard function for changes in the genererate_ scripts * bugfixes * updated for cif inclusion and new schema functionality * added carryforward flag to codecov * finalized structural annotations for phosphosites and KLIFS pocket * removed commented out code no longer in use * added docstrings for hardcoded resources; added KLIFS annotation to describe stick regions * incorporating changes trying to switch CIF files to the newly aligned coordinates; will undo with next commit as have introduced an error * reverting back to old version of KinCore * fixed ncbi codecov ignore * upgrade python to <3.13 in pyproject.toml files * changed flags path structure * moved constants to a separate file in app * added docstrings to databsases utils * removed TODO from hinge:linker * added KLIFS region labeling to x-axis * move the no KinCore objects error to the beginning; try to specify full width of KinCore active structure --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* removed extract_tarfiles from mkt.databases.io_utils - now in schema throughout * black * pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added requirements.txt * removed erroneously commited VE package data * pre-commit * pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change if statment order return_filenotfound_error_if_empty_or_missing; support conditions untar_files_in_memory like list and remove ._ files; make hgnc name key instead of uniprot id * make hgnc name default key from mkt.databases.kinase_schema instead of uniprot id * create_tar_without_metadata in io_utils; add tar.gz to generation script - mak sure to use w:gz to tarfile argument; new KinaseInfo.tar.gz with hgnc names as keys * interim app update * changed schema tests to reflect hgnc_name keys * remove file suffix from list_entries * black * ordered list * trailing whitespace * temporarily added rotation based on ABL1 * print to logging * wireframe including sequence, structure, and property panels * altered structure plot size * starting to add code to combine old CIF with new coords * added SequenceAlignment adapted from mkt.databases.plot * commented out SequenceAlignment in generate_alignments - default to using mkt.databases.plot * added optional flags for SequenceAlignment to allow to repurpose in app * removed reverse comment since now optional flag * using mkt.databases.plot version of SequenceAlignment; using PropertyTables * added resource links, radio button for structure * could not get plots version to display toolbar on bottom - reimplementing here for now * broke down SequenceAlignment into smaller plots * tried to upgrade bokeh to get plot version of SequenceAlignment class to display toolbar * added typing to serialization function * starting alignment algorithm function * add generate_properties to app * added try_except_return_none_rgetattr to mkt.databases.utils * make obj_kinase a PropertyTables property and generate extract_properties on instantiation * added obj_kinases as property, combined sequence generation and plotting into a single class, make the y-axis labels crimson if no sequence of a given type is found in the data * cleaned up extraneous arguments now included in radio buttons (programming to come), adjusted display_dashboard function for changes in the genererate_ scripts * bugfixes * updated for cif inclusion and new schema functionality * added carryforward flag to codecov * finalized structural annotations for phosphosites and KLIFS pocket * removed commented out code no longer in use * added docstrings for hardcoded resources; added KLIFS annotation to describe stick regions * incorporating changes trying to switch CIF files to the newly aligned coordinates; will undo with next commit as have introduced an error * reverting back to old version of KinCore * fixed ncbi codecov ignore * upgrade python to <3.13 in pyproject.toml files * changed flags path structure * moved constants to a separate file in app * added docstrings to databsases utils * removed TODO from hinge:linker * added KLIFS region labeling to x-axis * move the no KinCore objects error to the beginning; try to specify full width of KinCore active structure * pinning the package versions that are working locally --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* removed extract_tarfiles from mkt.databases.io_utils - now in schema throughout * black * pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added requirements.txt * removed erroneously commited VE package data * pre-commit * pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change if statment order return_filenotfound_error_if_empty_or_missing; support conditions untar_files_in_memory like list and remove ._ files; make hgnc name key instead of uniprot id * make hgnc name default key from mkt.databases.kinase_schema instead of uniprot id * create_tar_without_metadata in io_utils; add tar.gz to generation script - mak sure to use w:gz to tarfile argument; new KinaseInfo.tar.gz with hgnc names as keys * interim app update * changed schema tests to reflect hgnc_name keys * remove file suffix from list_entries * black * ordered list * trailing whitespace * temporarily added rotation based on ABL1 * print to logging * wireframe including sequence, structure, and property panels * altered structure plot size * starting to add code to combine old CIF with new coords * added SequenceAlignment adapted from mkt.databases.plot * commented out SequenceAlignment in generate_alignments - default to using mkt.databases.plot * added optional flags for SequenceAlignment to allow to repurpose in app * removed reverse comment since now optional flag * using mkt.databases.plot version of SequenceAlignment; using PropertyTables * added resource links, radio button for structure * could not get plots version to display toolbar on bottom - reimplementing here for now * broke down SequenceAlignment into smaller plots * tried to upgrade bokeh to get plot version of SequenceAlignment class to display toolbar * added typing to serialization function * starting alignment algorithm function * add generate_properties to app * added try_except_return_none_rgetattr to mkt.databases.utils * make obj_kinase a PropertyTables property and generate extract_properties on instantiation * added obj_kinases as property, combined sequence generation and plotting into a single class, make the y-axis labels crimson if no sequence of a given type is found in the data * cleaned up extraneous arguments now included in radio buttons (programming to come), adjusted display_dashboard function for changes in the genererate_ scripts * bugfixes * updated for cif inclusion and new schema functionality * added carryforward flag to codecov * finalized structural annotations for phosphosites and KLIFS pocket * removed commented out code no longer in use * added docstrings for hardcoded resources; added KLIFS annotation to describe stick regions * incorporating changes trying to switch CIF files to the newly aligned coordinates; will undo with next commit as have introduced an error * reverting back to old version of KinCore * fixed ncbi codecov ignore * upgrade python to <3.13 in pyproject.toml files * changed flags path structure * moved constants to a separate file in app * added docstrings to databsases utils * removed TODO from hinge:linker * added KLIFS region labeling to x-axis * move the no KinCore objects error to the beginning; try to specify full width of KinCore active structure * pinning the package versions that are working locally * fixed bug by sorting list_intersect in _generate_highlight_idx before using --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* removed src folder and put latest ESM2 analysis in outer dir * initial package infrastructure * moved previous esm2 analysis to an alternative outer folder * preliminary mkt.ml components * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 adjustments rebase to pull * updated after rebase * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Schema (#83) * changed schema namespace and included json files with package rather than in data directory * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 fixes in io_utils * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated schema_demo notebook for sub-package changes * updated schema demo notebook for latest changes * updated KinaseInfo json files with new field names * revised schema notebook for latest changes * now using dict to limit the possible serialization/deserialization functions; support for json, yaml, toml * using ConfigDict(use_enum_values=True) for Enum to enable serialization; default values for fields where None allowed to None; validating KLIFS2UniProt dicts to default to None if missing given toml doesn't save None entries * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unmarked files as executable * removed unused Callable import for flake7 compatibility pull to merge * removed unused Optional import for flake8 compatibility pull to merge * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added flags to codecov.yml * added schema dependencies to test_env.yaml * schema tests * added CI yaml for schema sub-package * updated schema notebook for increased standardization and typos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 modifications * added -e flag to pi p install - hope to resolve 'No package metadata was found' error * broke out dependencies specific to mkt.schema separately * use schema_test_env.yaml instead to try to resolve mkt-schema' requires a different Python: 3.13.2 not in '<3.12,>=3.9' * removed sub-package specific environments * conformed env and ci files to match asap * added tqdm to env and toml for schema * fixed kinhub.name to kinhub.kinase_name * added encoding='utf-8' to serialization function for Windows compatibility with TOML * added carryforwards to flags * utf-8 encoding causing Windows CI to lag - will not support Windows TOML formatting as a result * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * switched logging err or to info for Windows and TOML serialization * updated schema demo notebook for latest changes merge to pull * moved constants to separate file * added indent=4 for jsons in package * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added Colab instructions * updated documentation * namespace changed to mkt.databases; updated toml and removed poetry; added LICENSE and MANIFEST.in; added KinoML to acknowledgements * >= packages rather than pinning exact versions; added gitpython * changed to reflect namespace update * updated for namespace changes; for kinase_schema, also changed field names to reflect changes in mkt.schema package * updated notebooks for databases namespace changes * added channels to correct yaml formatting error when installing * added mkt.schema installation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused import * updated databases ci for namespace changes * removed init file from databases tests * added HTTPError exception handling for KLIFS and updated tests to reflect; separated KLIFS and KinCore tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * re-integrated kincore test into klifs - need it for KD indices * not testing NCBI; 404 error currently * commenting out carryforward for the moment * removed reference to Poetry in getting started docs * updated path changes and package description * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed executable mark from notebooks/schema_demo.ipynb * fixed schema-ci badge and typo * codecov ignoring ncbi in mkt.databases and skipping corresponding test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed git dependency from mkt.schema pyproject.toml * added panel wildcard to prevent from using 1.6 which requires python3.13 * panel == instead of >= * made mkt.databases a dependency of mkt.schema; databases now has a KinaseInfoGenerator that inherits from KinaseInfo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused imports * pre-commit hook * pre-commit hook fix * updated notebooks for UniProtFASTA * removed extraneous comments and added APIQuery superclass * fixed get_repo_root import statement * changed get_repo_root reference to io_utils in KinCore; updated uniprot for UniProtFASTA in tests * 404 not throwing an exception - added a dict_kinase_info = None option in if statement to accomodate * exception handling if no FASTA file downloaded * commented out line in pkis2 - TODO: further updates * fixed cache error * added UniProtFASTA vs. UniProtJSON (in progress) * added instructions for loading in colab * updated databases notebook to incorporate all recent changes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jessicawhite <[email protected]> * updated for flake8 rebase to merge * ignoring unused imports for the moment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * latest models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * flake8 updates * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * plotting for kinase embeddings - for now kinase group generation is hardcoded * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * models flake8 fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * moved clustering and plotting to separate modules * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * plotting working, added cli * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update in wip model * added TDC * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * switched from pkg_resources to importlib since fully deprecated in 3.12 * 4/18 update * Update requirements.yaml (#114) Try pinning Python version (python >=3.10,<3.11) * Update conf.py (#115) Fixed import statements * Update index.rst (#116) Increased max depth from 3 to 5 * Update conf.py (#117) changing sys.path * Update api.rst (#118) Using sub-modules in API * Update api.rst (#119) Removing api from submodules * Update api.rst (#120) Removing sub-modules * Update index.rst (#121) Changed max depth to 6 * reproducible conversion of PKIS accession to UniProt ID * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * script to generate harmonized PKIS2 data * added get_repo_root to utils * updated pkis2 annotated file for latest reconciliation * created a FineTuneDataset class to load data from csv files, split on kinase group, transform data with StandardScaler, and tokenize; also created a more specific PKIS2Dataset class * training scripts and initial pooling model * added run_trainer as a CLI * log_config for mkt.ml * clustering module * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * keep only latest * uncomment run_pipeline_with_wandb for pre-commit CI * commenting out all of models file - no longer in use but preserve for posterity * pre-commit ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated typo in PKIS2Dataset import statement; specified scaler object in dataset_pkis2 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * global ordering drug before kinase; more efficient max_length calc; revised Dataset generation * global drug before kinase * fixed collate_fn in dataloader; global drug before kinase; fixed model arguments * run interactively line by line to debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * finished merge * commented out device * commented out device import * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * global reorder drug before kinase * commented out interactive lines * sample-wise dot product instead of matrix multiplication * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docstrings * make wandb a bool param rather than separate function; make training logging a moving average instead * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed reference to wandb trainer function * log eval to wandb; val loss every 1000 steps + end of epoch; plot val stats real-time; keep only best 5 models * added entity name option to setup_wandb * added entity name option to trainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused imports and variables from trainer * updated trainer to better log validation data * added separate freeze arguments for drug and kinase model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * preliminary config logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more info on supported models * added ChemBERTa * provided more detail on how to implement additional model support * have made all models Enum/StrEnum for validation purposes * using ABC and abstract method for split; added support for CV * support for ABC finetuning module so separate cross-fold and splits * set_seed * self.seed = self.config[seed] * separate bool_freeze into drug and kinase options * removed extraneous import * update comments * added ExperimentFactory upport; TODO - pass to run_pipeline_with_wandb and trainer configs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pre-commit * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused field * added dict_trainer_configs return and informative model name * conform run_pipeline_with_wandb args to train_model args; added plot dir argument throughout; kwargs from dict in run_pipeline_with_wandb; remove default args from train_model to prevent unintentional overwriting * logging > logger * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pre-commit CI fixes: unused imports, dup plot_dir, undefined bool_wandb and eval_results * removed previous defaultdict architecture * added support to batch cross-validation scripts * removed drug and kinase default models, added fold_idx logic to allow for batching cross-validation folds * added fold_idx to PKIS2CrossValidation * added dict_trainer_configs to None output, typing and docstrings; improved model naming convention; convert learning_rate to float * utils_trainer to create SLURM scripts for cross-validation * make fold sub-directory and cd for purposes of output * make train_test/datetime sub-directory for train_test split * log config file as json * train_step * 10 global steps, import json * changed script_dir/out_dir configuration, removed non-existent import * moved the directory generation step to batch_submit_folds so no datetime difference in submitted jobs * corrected description * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused imports and fstring for pre-commit CI * fixed wandb.Images logging error; reverted back to raw global loss for train steps * removed unused, commented out arguments in batch_submit_folds * added instructions about running on cluster * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added details in configs about what needs to be updated * added environment.yml * added environment set-up instructions in README * removed pip installed packages from env yml - pip install toml for dependencies instead * use dict for class-specific kwargs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed trailing whitespace * added tdc support for davis * confirmed dict formulation worked and removed commented out args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * created KinaseGroupSource * standardized docstrings in FineTuneDataset * added rgetattr with exception handling * removed unused imports * mkt.ml.datasets.process module to create datasets * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add kincore kd columns * harmonized config and process classes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused dataclass imports * add source column at the end * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unnecessary comments, add Davis TODO * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added generate_datasets to cli for ml * convert Davis y to pKd in micromolar before z-score conversion so higher values denote more potent * comment out drop na since now do in cli more rigorously * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * switched flag from drop to keep * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * generate pseudo random uuids * add uuids at the level of the original data processing * update nf pipeline to reflect uuid added at the level of the data generation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use random_uuid instead of uuid4 * added attribution * remove unused uuid import * clarified comments * create a BaseCombinedModel that loads models from pretrained and compute_similarity; AbstractTransformModel now has transform_drug and transform_kinase along with forward pass where the transform methods are abstract and instantiated elsewhere (e.g., pooling vs. attention * instantiate transformation functions and layer names at the level of CombinedPoolingModel * KD TODO clarified * use adjudicate_kd_sequence to annotate * change PKIS to % inhibition; add KD column algorithm using adjudicate KD sequence * import rgetattr from mkt.schema instead of mkt.databases * simplified adjudicate_group for readability * KinaseKDSequenceSource TODO * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jessicawhite <[email protected]>

… ipython, jupyter, and mol2grid to requirements

codecov · 2025-08-20T14:56:05Z

Codecov Report

❌ Patch coverage is 82.35294% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.48%. Comparing base (138b08a) to head (69bb125).
⚠️ Report is 16 commits behind head on main.

Additional details and impacted files

Flag	Coverage Δ
databases	`100.00% <ø> (ø)`
schema	`90.43% <82.35%> (-0.95%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

jessicaw9910 and others added 30 commits April 10, 2025 14:20

pc hooks

147c974

pc hooks

48fad29

remove remnant filepath info from str_filename, saved as pdb instead …

ae4299b

…of cif, used load_raw instead of load for cif files

aligned PDB structures

a651b85

Merge branch 'main' into modeling

10b6bf2

saved as CIF instead of PDB files

9380a48

merge main to push

6948191

Merge branch 'main' into modeling

5318940

package set-up for mkt.modeling

d2af8e4

initial modeling README with instructions on how to set up environments

1c0b347

added prolif MDAnalysis rdkit

547fe6d

specify boltz[cuda] and don't check if in environment

5b74f8e

[pre-commit.ci] auto fixes from pre-commit.com hooks

c73dfd4

for more information, see https://pre-commit.ci

Merge branch 'main' into modeling

c69cfc8

nextflow for boltz

a42aaa8

Merge branch 'modeling' of https://github.com/choderalab/missense-kin…

32cf0ab

…ase-toolkit into modeling

[pre-commit.ci] auto fixes from pre-commit.com hooks

16a42c9

for more information, see https://pre-commit.ci

aggregate successful runs into a single csv

ff518fd

add csv creation to workflow; make BOLTZ_CSV an entry point

02799da

fixed duplication in create_yaml.py

0cb7372

separate out by row and aggregation processes

a96ae6b

updated for CSV writing process

c47cbf3

Merge branch 'main' into modeling

8cc1646

[pre-commit.ci] auto fixes from pre-commit.com hooks

da32268

for more information, see https://pre-commit.ci

added flag to output pdb format instead of cif

f60c48d

Merge branch 'main' into modeling

9ec6ce2

Merge branch 'main' into modeling

15a0e9c

allow cif or pdb output

4bb04ac

Merge branch 'main' into modeling

e27dbf0

jessicaw9910 and others added 9 commits August 20, 2025 10:50

Delete notebooks/diffusion_group directory (#160)

269cca4

remove Linux package references from env for Mac compatibility; added…

69bb125

… ipython, jupyter, and mol2grid to requirements

jessicaw9910 and others added 4 commits September 25, 2025 16:50

Merge branch 'main' into modeling

c469243

preliminary code for generating a KLIFS ligand interaction fingerprint

7e880a0

pull to push

0f22ca9

[pre-commit.ci] auto fixes from pre-commit.com hooks

4f54c32

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modeling #167

Modeling #167

Uh oh!

jessicaw9910 commented Aug 20, 2025

Uh oh!

codecov bot commented Aug 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Modeling #167

Are you sure you want to change the base?

Modeling #167

Uh oh!

Conversation

jessicaw9910 commented Aug 20, 2025

Description

Todos

Questions

Status

Uh oh!

codecov bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Aug 20, 2025 •

edited

Loading