This project focused on single-cell RNA sequencing (scRNA-seq) data analysis, specifically from human bone marrow cells and CD34+ enriched bone marrow cells. The analysis was performed using the Seurat package in R.
-
Loading the Data: Expression matrices from two datasets (BMMC and CD34) were loaded and a Seurat object was constructed.
-
Metadata Addition: Relevant metadata was added to each sample, including the number of cells and genes in the expression matrices.
-
Preprocessing: Data preprocessing included filtering, doublet removal using DoubletFinder, normalization, and feature selection.
-
Batch Correction: Two different methods were applied: merging without batch correction and integrating the data using Seurat’s batch correction method. Comparisons were made to evaluate the necessity of batch correction.
-
Dimensionality Reduction: Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP) were used to reduce dimensions and visualize the data in 2D.
-
Clustering: Clustering was performed on the reduced data, resulting in 7-15 clusters, which were visualized in 2D.
-
Cell Type Annotation: Both automatic and manual cell type annotations were performed. Differential expression analysis was conducted to identify cell-type-specific markers.
-
Differential Expression Analysis: Differential expression between cell types (B cells vs T cells, T cells vs Monocytes) was performed, and top differentially expressed genes were plotted.
-
Pathway Analysis: Gene ontology (GO) pathway analysis was performed for differential expression between BMMC and CD34 datasets, focusing on the top pathways and their biological implications.
-
Trajectory Analysis: A subset of cells was selected for trajectory analysis using Monocle 3 to study the progression of cell states.
The R script used for analysis is named scRNA_seq_script.
The resulting plots and images can be found in the results/img
directory of this repository.