Skip to content

Conversation

@jessicaw9910
Copy link
Collaborator

@jessicaw9910 jessicaw9910 commented Nov 11, 2025

Description

Sub-package for ML; currently sequence focused

Todos

Notable points that this PR has either accomplished or will accomplish.

  • Process raw PKIS2, Davis data
  • Load PKIS2 data
  • Load Davis data
  • Drug models
  • Kinase models
  • MLP on pooling layer
  • Cross-attention on full latent embedding (IN PROGRESS)
  • Factor model for mutational data (IN PROGRESS)
  • Factor model
  • Tokenizers
  • Configs
  • Update Davis instead of using TDC
  • Standardize SMILES strings
  • Use _deserialization_cache method for DICT_KINASE
  • Use PyTorch CUDA expandable segments
  • Absolute not relative paths in batch scripts
  • Cluster kinases and drugs for splits
  • Docs
  • Tests

Questions

  • Want to perform a per vector dot product rather than a matrix multiplication - how? Einsum
  • How to batch if using CV dataset from Python? Set up utils_trainer.py script that batches cross-validation jobs

Status

  • Ready to go

jessicaw9910 and others added 30 commits April 17, 2025 15:22
Try pinning Python version (python >=3.10,<3.11)
Fixed import statements
Increased max depth from 3 to 5
changing sys.path
Using sub-modules in API
Removing api from submodules
Removing sub-modules
Changed max depth to 6
… kinase group, transform data with StandardScaler, and tokenize; also created a more specific PKIS2Dataset class
jessicaw9910 and others added 29 commits October 8, 2025 11:17
… mkt.databases.datasets; fintune now contains KinaseSplit and CrossValidation classes
… paths in bash script and check script_dir exists so can resume run anywhere; lowercase comments
@jessicaw9910 jessicaw9910 changed the title Ml ML Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants