Skip to content

vkola-lab/gm2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Attention-based deep learning for analysis of pathology images and gene expression data in lung squamous premalignant lesions

Introduction

We propose a missing-data-friendly framework that integrates histologic and transcriptomic data to discriminate dysplastic versus non-dysplastic lesions and enable stratification of early lung lesions. The study leverages H&E whole slide images (WSIs) of endobronchial biopsies and bulk gene expression data (GE) derived from endobronchial biopsies and brushings from six previously published studies and on-going lung precancer atlas efforts obtained from patients at high-risk for lung cancer.

Training

1. WSI feature extraction

WSI features are obtained through binary mask-based tessellation followed by patch feature extraction using UNI2-h. Related scripts can be found in ./preprocessing/WSI/. Example executions are available in ./scripts. Example workflow:

cd scripts
bash compute_mask.sh
bash tile_WSI.sh
bash extract_features.sh

2. Transcriptomic feature harmonization

Gene features associated with bronchial dysplasia are selected using a linear mixed-effects model applied to normalized, batch-corrected bulk RNA-seq data. Related scripts can be found in ./preprocessing/gene/.

3. Training single- and dual- modality models (Models a, b, c, d)

A single-modality model is trained and validated on samples with WSI only (Model a) or GE only (Model c). A dual-modality model is trained on samples with both WSI and GE and validated on samples with WSI only (Model b) or GE only (Model d).

  • Implementation details: ./src/model/training1.py.
    • The presence or absence of each modality is controlled by the --mask-gene and --mask-wsi flags.
  • Example: ./scripts/training1.sh.

4. Training the flexible fusion model

The flexible fusion model relaxes the requirement of paired WSI and GE data and allows the training data to include either WSI or GE or both.

  • Implementation details: ./src/model/training2.py.
  • Example: ./scripts/training2.sh.

Evaluation

  • To evaluate a single- or dual- modality model, run ./src/model/inference1.py. Example: ./scripts/inference1.sh.
  • To evaluate the flexible fusion model, run ./src/model/inference2.py. Example: ./scripts/inference2.sh.

References

[1] Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F., Jaume, G., Song, A.H., Chen, B., Zhang, A., Shao, D., Shaban, M. and Williams, M., 2024. Towards a general-purpose foundation model for computational pathology. Nature medicine, 30(3), pp.850-862.

[2] MahmoodLab. UNI2-h. Hugging Face model repository. https://huggingface.co/MahmoodLab/UNI2-h (CC-BY-NC-ND-4.0), 2025.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors