GAND project repository for reproducible analysis.
The following analysis was wrapped in a Nextflow pipeline which handles data download to report building. A more extensive pipeline breakdown is available below. All path to files are all relative to the root of this directory.
IMPORTANT NOTICE: While we provide containers for this analysis, the target architecture was Linux (Amd64). We do not provide Mac images. Mac images can be built using the Dockerfile or the Nix flake if required.
Data download of publically avalible data sets are handled directly through the workflows/dwl_data.nf workflow.
To modify data download parameters, open the nextflow.config file and:
- Toggle data download
run_download = true - Modify data URLs and final locations (NOTE: Nextflow expects data to be placed in these directories - modify at your own risk)
dwl = [
scrna_url : [
"https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE244477&format=file"
],
ref_url : [
"https://ftp.ncbi.nlm.nih.gov/geo/series/GSE123nnn/GSE123335/suppl/GSE123335%5FE14%5Fcombined%5Fmatrix.txt.gz",
"https://ftp.ncbi.nlm.nih.gov/geo/series/GSE123nnn/GSE123335/suppl/GSE123335%5FE14%5Fcombined%5Fmatrix%5FClusterAnnotations.txt.gz"
],
output_scrna : "${baseDir}/data/scRNA/",
output_ref : "${baseDir}/data/ref/"
]
NOTE: For clarity, we provide a manifest.txt in data/scRNA to facilitate the automatic naming of files.
Single Nuclei RNA-seq data sets are integrated using Seurat. Data sets are pre-processed and QC prior to integration.
Integration parameters can be found in the nextflow.config file under the following section:
// Integration workflow params
integration = [
input : "${baseDir}/data/scRNA/",
ref : "${baseDir}/data/ref",
tmp : "${baseDir}/data/tmp_scrna/",
manifest : "${baseDir}/data/scRNA/manifest.txt",
npcs : 30, // Number of Princpal Components
min_features : 100, // Minimum number of feature per cell
max_features : 10000, // Maximum number of features per cell
percent_mt : 10, // Percentage of Mitochondrial RNA allowed per cell
n_var_features : 2000, // Number of Variable features used for PCA
cluster_resolution : 0.4, // Louvain clustering resolution
integration_tag : "integrated", // Tag to name the integration objects and meta data
integration_method : "RPCAIntegration", // Seurat method used to integrate data
]
To model the expression of certain gene sets by cell type and by condition (accounting for samples), we used a Linear Mixed Effect Model. We also check if expression of certain gene were mutually exclusive in terms of expression patterns.
To modify, the gene sets update the parameters in the following section:
// Modelling Params
modelling = [
tmp : "${baseDir}/data/tmp_scrna/",
annotated : "${params.integration.tmp}/GAND_seurat_annotated.rds",
gene_sets : [["Chd3", "Foxp1","Foxp2","Satb2"],
["Chd3", "Foxp1","Satb2"],
["Chd3","Foxp2","Satb2"],
["Arx"]], // Gene sets to check for enrichment by cell type
min_cells : 1, // Minimum number of cells used by cell type for modelling
mut_genes : ["Foxp1","Foxp2"], // Check if two genes are mutually exclusively expressed in cells
score_type: ["module","counts"], // mode = Seurat module score || counts = log counts => which to use for modelling
]
A final pdf report is built from all the intermediate csv files. These files are produced to mimic the Source Data formatting often required by journals for publication. While we provide a report building option, we encourage you to use the source data to make your own plots should you wish to change the aesthetics.
The source data location is shown and handled by the nextflow.config file.
report = [
annotated : "${baseDir}/data/tmp_scrna/GAND_seurat_annotated.csv",
mut_genes : "${baseDir}/data/tmp_scrna/mutually_exclusive_genes.csv",
gene_sets : "${baseDir}/data/tmp_scrna/*_geneset_list.csv",
template : "${baseDir}/bin/scRNA_report_template.Rmd",
output : "${baseDir}/results/scRNA",
]
A Docker image was built using the definition file found in containers. Currently, only a Linux (amd64) target was built since the analysis was run exclusively run on a Linux HPC. Specifially, we built from MacOS with Linux target.
cd containers
docker buildx build --platform linux/amd64 \
-t gand:v0.0.1 \
--load .
docker save gand:v0.0.1 -o gand_image.tar
Converstion to HPC safe apptainer .sif file was achieved through the following command:
# If using a SLURM grid engine
# module load apptainer
# Build the SIF from the docker-archive
apptainer build gand_image.sif docker-archive://gand_image.tar