scMDCF is a python package containing tools for clustering single cell multi-omics data based on cross-modality contrastive learning to learn the common latent representation and assign clustering.
Single-cell multi-omics (scMulti-omics) technologies have revolutionized our understanding of cellular functions and interactions by enabling the simultaneous measurement of diverse cellular modalities. However, the inherent complexity, high-dimensionality, and heterogeneity of these datasets pose substantial challenges for integration and analysis across different modalities. To address these challenges, we develop a single-cell multi-omics deep learning model (scMDCF) based on contrastive learning, tailored for the efficient characterization and integration of scMulti-omics data. scMDCF features a cross-modality contrastive learning module that harmonizes data representations across different omics types, ensuring consistency while accommodating conditional entropy to preserve data heterogeneity. Furthermore, a cross-modality feature fusion module is designed to extract common low-dimensional latent representations of scMulti-omics data, effectively balancing the characteristics of these diverse omics data. Extensive empirical studies demonstrate that scMDCF outperforms existing state-of-the-art scMulti-omics models across various types of scMulti-omics data. In particular, scMDCF exhibits progressive capability in extracting cell-type specific peak-gene associations and cis-regulatory elements from SNARE-seq data, as well as in elucidating immune regulation from CITE-seq data. Furthermore, we demonstrate that in the post-BNT162b2 mRNA SARS‐CoV‐2 vaccination dataset, scMDCF successfully annotates specific vaccine-induced B cell subpopulations through integrative and multimodal analysis, uncovering dynamic interactions and regulatory mechanisms within the immune system after vaccination.

scMDCF package requires only a standard computer with enough RAM to support the in-memory operations.
This package is supported for Linux. The package has been tested on the following systems:
- Linux: Ubuntu 18.04
scMDCF mainly depends on the Python scientific stack.
numpy
pytorch
scanpy
pandas
scikit-learn
For specific setting, please see requirements.
conda create -n scMDCF_env python=3.9.16
conda activate scMDCF_env
pip install scMDCF==1.1.3
scMDCF is a deep embedding learning method for single-cell multi-omics data clustering, which can be used to:
- CITE-seq dataset clustering. The example can be seen in the main_CITE.py
- SNARE-seq (paired RNA-seq and ATAC-seq) dataset clustering. The example can be seen in the main_SNARE.py
The datasets we used can be download in dataset
This project is covered under the MIT License.