Recent advances in single-cell sequencing technology have made it possible to measure multiple paired omics simultaneously in a single cell such as CITE-seq and SNARE-seq. Yet the widespread application of those single-cell multiomics profiling technologies has been limited by their experimental complexity, noise in nature, and high cost. In addition, single-omics sequencing technologies have generated tremendous and high-quality single-cell datasets but have yet to be fully utilized. Here, we develop scMOG, a deep learning-based framework to in silico generate single-cell ATAC data from experimentally available single-cell RNA-seq measurements and vice versa.
For more information, please see to our paper: Efficient generation of paired single-cell multiomics profiles by deep learning
To install scMOG, make sure you have PyTorch and scanpy installed. If you need more details on the dependences, look at the environment.yml file. Set up conda environment for scMOG:
conda env create -f environment.yml
scMOG is trained using paired scRNA-seq/scATAC-seq measurements. We provide the data pre-processing code. The user only needs to input the 'h5' file of the data to perform the corresponding data pre-processing(Each h5 file must contain both RNA and ATAC paired modalities):
python bin/Preprocessing.py --data FILE1.h5 FILE2.h5 --outdir Preprocessed_datasetsIn addition, we also support other multi-omics dataset inputs such as SHARE-seq, SNARE-seq, for example:
python bin/Preprocessing.py --snareseq --outdir snareseq_datasetsWith the datasets obtained after pre-processing, we can then train the model:
python bin/train.py --outdir outputThis training script will create a new directory output that contains:
***.pthfiles, which contain the trained model parameters.loss.pdffiles, which is used as an evaluation indicator of the model, which reflects the degree of fit between the predicted and true values of the model*.pdffiles that contain summary test set metrics such as correlation , AUPRC and AUROC.
Once trained, scMOG can generate paired datasets from other datasets using the following example command.
python bin/predict-rna.py --outdir Otherdataset_generation
python bin/predict-atac.py --outdir Otherdataset_generation scMOG will create its outputs in the folder Otherdataset_generation accordingly:
- Various
*.h5adfiles containing the predictions. These are named with the conventioninputMode_outputMode_adata.h5ad. For example the fileatac_rna_adata.h5adcontains the RNA predictions from ATAC input. - If given paired data, this script will also generate concordance metrics in
*.pdffiles with a similar naming convention. For example,atac_rna_log.pdfwill contain a log-scaled scatterplot comparing measured and imputed expression values per gene per cell.
All the above scripts have some options designed for advanced users, exposing some features such as clustering methods, learning rates, etc. Users can adjust them by themselves.
