Single-Cell Analysis Script

1. Quality control

Running instruction

Run the script by executing the following command:

python s1_qc.py -p <prefix> [-o <output_directory>] [-m <mitochondrial_genes_percent>] [-s]

<prefix>: Prefix of the input file (required).
<output_directory>: Output directory for the results (default: "qc").
<mitochondrial_genes_percent>: Percent of mitochondrial genes to consider as outliers (default: 15).
-s: Flag indicating whether to scale the data (default: False). adata.raw is created in s1*.py after filtering outliers normalization.

Output

The script produces the following outputs:

QC Result:
- File: d1_<prefix>_qced.h5ad
- Description: Filtered and quality-controlled dataset after outlier removal.
Doublet detection
- powered by package DoubletDetection
Normalized Result:
- File: d2_<prefix>_normlog.h5ad
- Description: Normalized and log-transformed dataset after QC.
- Notice: contain adata.raw (include all genes after normalization)
- Add doublet prediction result and doublet score columns in adata.obs
Figures:
- Location: <output_directory>
- Description:
  - QC related: total counts histogram, mitochondrial gene percentage violin plot, scatter plot of total counts vs. gene counts, and violin plot of QC metrics.
  - Doublet detection: Doublet heatmap

2. Select hvg

Running Instruction

python s3_bbknn.py -n <number> -p <prefix> [-b <batch_key>]

<number>: The number of highly variable genes to be selected (default: 2000).
<prefix>: Prefix of the input file (required).
<batch_key>: Batch key of the file (default: 'patient').

Transformation

sc.pp.pca
sc.pp.neighbors
sc.tl.umap
sc.tl.leiden

Output

The script produces the following outputs:

Highly Variable Genes (HVGs) Result:
- File: d3_<prefix>.h5ad
- Description: Dataset containing the selected highly variable genes.
PCA Result:
- Description: Principal Component Analysis (PCA) performed on the selected HVGs.
UMAP Result:
- Description: Uniform Manifold Approximation and Projection (UMAP) performed on the PCA results.
- Figure: UMAP plot showing the clustering results colored by 'leiden' and 'batch_key'.
  - File: before_bbknn.png
  - Location: cluster/
UMAP Result (Saved Dataset):
- File: d4_<suffix>_umap.h5ad
- Description: Dataset with UMAP coordinates and clustering information.

3. deal with batch effect (bbknn)

Running Instruction

python s3_bbknn.py -p <prefix> -b <batch_key>

<prefix>: Prefix of the input file (required).
<batch_key>: Batch key of the file (required).

Output

The script produces the following outputs:

BBKNN Result:
- File: d5_<prefix>.h5ad
- Description: Dataset after performing the BBKNN integration.
UMAP Result:
- Figure: UMAP plot showing the clustering results after BBKNN integration, colored by 'leiden_r2' and 'batch_key'.
  - File: after_bbknn.png
  - Location: cluster/

Note: leiden clustering resolution=2, the result is stored in key leiden_r2

adata.layers["counts"] is also created in s1.py. Data in counts layer is un-normalized and not log-transformed.

I used resolution=2 to run leiden clustering. And the result after bbknn is stored in key leiden_r2.

marker file: should contain at least cell_name and Symbol two columns

9. correlation

Produce boxplot, heatmap and conduct statistical test.

--heatmap:

all: draw one heatmap with all conditions (levels)
sep: draw a separate heatmap for each condition (level)
no: do not draw heatmap
heatmap.csv: provide a custom file for grouping

Format of heatmap.csv (| represents , in csv file):

Tumor	I II III
[condition]	[states (separated by spaces)]

--filter_sample:

If specified a number, the sample whose total number of cells (of the same cell type) below this threshold will be filtered. We used 15 as threshold.

--test_type:

Choices are 1 (means single-sided test) or 2 (means double sided test). Default option is 2.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
markers		markers
temp		temp
.gitignore		.gitignore
README.md		README.md
analysis.py		analysis.py
generate_ann.py		generate_ann.py
heatmap.csv		heatmap.csv
job.slurm		job.slurm
s1_qc.py		s1_qc.py
s2_hvg.py		s2_hvg.py
s3_bbknn.py		s3_bbknn.py
s4_celltype.py		s4_celltype.py
s5_vis.py		s5_vis.py
s6_cnv.py		s6_cnv.py
s7_cell_type.py		s7_cell_type.py
s8_cell_typist.py		s8_cell_typist.py
s9_correlation.py		s9_correlation.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single-Cell Analysis Script

1. Quality control

Running instruction

Output

2. Select hvg

Running Instruction

Transformation

Output

3. deal with batch effect (bbknn)

Running Instruction

Output

9. correlation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Single-Cell Analysis Script

1. Quality control

Running instruction

Output

2. Select hvg

Running Instruction

Transformation

Output

3. deal with batch effect (bbknn)

Running Instruction

Output

9. correlation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages