Incorporating network diffusion and peak location information for better single-cell ATAC-seq data analysis
We recommend to create a new environment with Python 3.10:
conda create -n py310 python=3.10
conda activate py310The scarp package can be installed via pip:
pip install scarpscanpy (>=1.9.5)
numpy (>=1.25.2)
scipy (>=1.11.3)
pandas (>=2.1.1)
Please checkout the tutorials at here.
You can downloaded a example data from here.
import scanpy as scdata_name = 'Leukemia'
data = sc.read_h5ad('./Example_data/Leukemia.h5ad')from scarp import modelCells_df = model.SCARP(adata=data,
data_name=data_name,
plot_SD=True,
verbose=True
)| parameter name | description | type | default |
|---|---|---|---|
| adata | input scATAC-seq data | h5ad | None |
| data_name | name of this dataset | str | None |
| m | parameter to control NR diffusion intensity | float | 1.5 |
| gamma | parameter to control the threshold for merging adjacent chromosomes | int | 3000 |
| beta | parameter to control the extent to which prior edge weight decays | int | 5000 |
| return_shape | shape of the returned matrix | str | 'CN' |
| peak_loc | use peak location prior information or not | bool | True |
| parallel | parallel computing or not. 0 means automatically determined | int | 0 |
| plot_SD | plot the SDs of PCs or not | bool | True |
| fig_size | figure size of the SD plot | tuple | (4,3) |
| save_file | if plot_std is True, the file path you want to save | str | None |
| verbose | print the process or not | bool | True |
t, diffusion_mat = model.SCARP_diffusion_mat(adata=data)k = model.SCARP_SD_plot(data=diffusion_mat,
peaks_num=Peaks_num,
title=data_name,
plot_SD=True)Cell_embedding = model.SCARP_cell_embedding(diffusion_mat=diffusion_mat,
kept_comp=k)For reproducibility, we provide all the necessary scripts and data here.