Skip to content

A versatile information retrieval framework for evaluating profile strength and similarity

License

Notifications You must be signed in to change notification settings

carpenter-singh-lab/2025_Kalinin_mAP

Repository files navigation

A versatile information retrieval framework for evaluating profile strength and similarity

This repository contains the source code to reproduce the results in the paper: "A versatile information retrieval framework for evaluating profile strength and similarity".

Getting started

System requirements

This repository supports Python 3.11 and should work with all modern operating systems (tested with MacOS 14.5, Ubuntu 18.04).

Dependencies

This code depends on widely used Python packages:

  • numpy
  • scipy
  • pandas
  • jupyter
  • seaborn
  • networkx
  • umap-learn
  • scikit-learn

It also uses pycytominer for profiling data preprocessing and copairs for profile grouping and mAP calculations.

Installation

We suggest using Conda for environment management. The following commands create the environment from scratch and install the required packages.

conda create -n map_eval "python==3.11"
conda activate map_eval
pip install .

R installation

Preprocessing of Perturb-seq data requires creating a separate R environment:

conda env create -f perturbseq_processing_environment.yml

Contents

Results are organized per dataset in the experiments subdirectory.

Each experiment directory includes brief description of the dataset and scripts and/or Jupyter notebooks to download and preprocess data, calculate metrics, and generate figures for the paper.

Experiments

  1. Simulations (Figures 2, S2-5)
  2. CellHealth data (Figures 3, S6)
  3. cpg0004 data (Figures S7A, S7C)
  4. cpg0016orf data (Figures S7B, S7D)
  5. nELISA data (Figures 4A-B)
  6. Perturb-seq data (Figures 4C-D, S5A-B, S8)
  7. Mitocheck data (Figure 5C-D, S9-10)

To reproduce results on all real-world datasets (2-7), run

bash run_all_data.sh

Citation

@article {kalinin2024versatile,
        author = {Kalinin, Alexandr A. and Arevalo, John and Vulliard, Loan and Serrano, Erik and Tsang, Hillary and Bornholdt, Michael and Muñoz, Alán F. and Sivagurunathan, Suganya and Rajwa, Bartek and Carpenter, Anne E. and Way, Gregory P. and Singh, Shantanu},
        title = {A versatile information retrieval framework for evaluating profile strength and similarity},
        year = {2025},
        doi = {10.1101/2024.04.01.587631},
        publisher = {Cold Spring Harbor Laboratory},
        URL = {https://doi.org/10.1101/2024.04.01.587631},
        journal = {bioRxiv}
}