This skill covers quality control, filtering, and normalization for single-cell RNA-seq data using both Seurat (R) and Scanpy (Python). These are essential steps before clustering and downstream analysis.
Python (Scanpy):
pip install scanpy matplotlibR (Seurat):
install.packages('Seurat')Ask your AI agent:
"Run QC on my single-cell data and filter low-quality cells"
"Normalize my scRNA-seq data and find highly variable genes"
"Preprocess this 10X data for clustering"
"Calculate QC metrics including mitochondrial percentage"
"Show violin plots of QC metrics"
"What are good filtering thresholds for this dataset?"
"Filter cells with less than 200 genes or more than 20% mitochondrial"
"Remove low-quality cells and rarely detected genes"
"Normalize using log normalization"
"Run SCTransform on this Seurat object"
"Normalize to 10,000 counts per cell"
"Find the top 2000 highly variable genes"
"Show a plot of variable features"
- Calculate QC metrics (gene counts, UMI counts, mito %)
- Visualize distributions to inform filtering
- Apply filtering thresholds
- Normalize and log-transform counts
- Identify highly variable genes
- Scale data for PCA
- Store raw counts before normalization for later use
- SCTransform is recommended for Seurat workflows (combines normalize, HVG, scale)
- Mitochondrial threshold varies by tissue (5% for PBMCs, 20% for some tissues)
- Filter doublets first - high gene counts often indicate doublets
- Check QC plots before choosing thresholds - they're dataset-specific