Run the script by executing the following command:
python s1_qc.py -p <prefix> [-o <output_directory>] [-m <mitochondrial_genes_percent>] [-s]- <prefix>: Prefix of the input file (required).
- <output_directory>: Output directory for the results (default: "qc").
- <mitochondrial_genes_percent>: Percent of mitochondrial genes to consider as outliers (default: 15).
-s: Flag indicating whether to scale the data (default: False).adata.rawis created ins1*.pyafter filtering outliers normalization.
The script produces the following outputs:
- QC Result:
- File:
d1_<prefix>_qced.h5ad - Description: Filtered and quality-controlled dataset after outlier removal.
- File:
- Doublet detection
- powered by package
DoubletDetection
- powered by package
- Normalized Result:
- File:
d2_<prefix>_normlog.h5ad - Description: Normalized and log-transformed dataset after QC.
- Notice: contain
adata.raw(include all genes after normalization) - Add doublet prediction result and doublet score columns in
adata.obs
- File:
- Figures:
- Location:
<output_directory> - Description:
- QC related: total counts histogram, mitochondrial gene percentage violin plot, scatter plot of total counts vs. gene counts, and violin plot of QC metrics.
- Doublet detection: Doublet heatmap
- Location:
python s3_bbknn.py -n <number> -p <prefix> [-b <batch_key>]<number>: The number of highly variable genes to be selected (default: 2000).<prefix>: Prefix of the input file (required).<batch_key>: Batch key of the file (default: 'patient').
sc.pp.pcasc.pp.neighborssc.tl.umapsc.tl.leiden
The script produces the following outputs:
- Highly Variable Genes (HVGs) Result:
- File:
d3_<prefix>.h5ad - Description: Dataset containing the selected highly variable genes.
- File:
- PCA Result:
- Description: Principal Component Analysis (PCA) performed on the selected HVGs.
- UMAP Result:
- Description: Uniform Manifold Approximation and Projection (UMAP) performed on the PCA results.
- Figure: UMAP plot showing the clustering results colored by 'leiden' and 'batch_key'.
- File:
before_bbknn.png - Location:
cluster/
- File:
- UMAP Result (Saved Dataset):
- File:
d4_<suffix>_umap.h5ad - Description: Dataset with UMAP coordinates and clustering information.
- File:
python s3_bbknn.py -p <prefix> -b <batch_key>
<prefix>: Prefix of the input file (required).<batch_key>: Batch key of the file (required).
The script produces the following outputs:
- BBKNN Result:
- File:
d5_<prefix>.h5ad - Description: Dataset after performing the BBKNN integration.
- File:
- UMAP Result:
- Figure: UMAP plot showing the clustering results after BBKNN integration, colored by 'leiden_r2' and 'batch_key'.
- File:
after_bbknn.png - Location:
cluster/
- File:
- Figure: UMAP plot showing the clustering results after BBKNN integration, colored by 'leiden_r2' and 'batch_key'.
Note: leiden clustering resolution=2, the result is stored in key leiden_r2
adata.layers["counts"] is also created in s1.py. Data in counts layer is un-normalized and not log-transformed.
I used resolution=2 to run leiden clustering. And the result after bbknn is stored in key leiden_r2.
marker file:
should contain at least cell_name and Symbol two columns
Produce boxplot, heatmap and conduct statistical test.
--heatmap:
all: draw one heatmap with all conditions (levels)sep: draw a separate heatmap for each condition (level)no: do not draw heatmapheatmap.csv: provide a custom file for grouping
Format of heatmap.csv (| represents , in csv file):
| Tumor | I II III |
|---|---|
| [condition] | [states (separated by spaces)] |
--filter_sample:
If specified a number, the sample whose total number of cells (of the same cell type) below this threshold will be filtered.
We used 15 as threshold.
--test_type:
Choices are 1 (means single-sided test) or 2 (means double sided test). Default option is 2.