scxa-tertiary-workflow

Introduction

Tertiary component for Single-Cell Expression Atlas workflows, focused on post-processing and advanced analyses like normalization, PCA, clustering, t-SNE, and UMAP visualizations.

Overview

This Nextflow workflow is designed to perform analysis downstream of the quantification of expression counts from single-cell RNA sequencing (scRNA-seq) raw data. This tertiary analysis takes processed data (expression matrix and metadata) as input, normalizes and scales the data, identifies variable genes, runs principal component analysis (PCA), integrates batch effects using Harmony, calculates cell neighborhoods, finds clusters, and performs visualizations like UMAP and t-SNE. It also finds markers for cell groupings.

The workflow runs these analyses using Scanpy, leveraging the scanpy-scripts package to run individual steps of the Scanpy workflow.

How to run the workflow

Prepare the data

To perform a tertiary analysis, all required datasets must be stored in a single directory, which should be specified using the --dir_path parameter.

Ensure the following files are present in the directory:

genes_metadata.tsv – Metadata information for genes
genes.tsv – List of gene identifiers
barcodes.tsv – Cell barcode identifiers
matrix.mtx – Expression matrix in Matrix Market format
cell_metadata.tsv – Metadata information for individual cells

Requirements

Nextflow
Singularity or Docker

High-performance computing

This workflow can be run on High-performance computing.

SLURM. For SLURM job scheduling - use the --slurm option.

Running the workflow

The workflow can be executed for two types of scRNA-seq technologies: plate-based and droplet-based.

For plate-based data:

nextflow run main.nf --slurm -resume --dir_path <EXP-ID with path> [--output_path <PATH>]  [--scanpy_scripts_container <container_id>] [--celltype_field <celltype_field>]

For droplet-based data:

nextflow run main.nf --slurm -resume --dir_path <EXP-ID with path> --technology droplet [--output_path <PATH>] [--scanpy_scripts_container <container_id>] [--celltype_field <celltype_field>]

--technology droplet: Specifies that the data is droplet-based. This enables additional steps for multiplet detection (using Scrublet) and doublet removal.
The remaining parameters are the same as for the plate-based run.

The workflow uses Singularity by default, but users can add the -profile docker to run using Docker.

Running the workflow for SCEA

If running for Single-cell Expression Atlas, specify the Atlas-specific profile by adding -profile atlas to the Nextflow command.

Output

If [--output_path <PATH>] is not specified results will be <EXP-ID with path>/results dir.

Credits

ebi-gene-expression-group/scxa-tertiary-workflow was originally written by Anil Thanki, Iris Yu and Pedro Madrigal, based on a previous Galaxy workflow developed by Pablo Moreno and Jonathan Manning.

We thank the following people and teams for their extensive assistance in the development of this pipeline:

Citations

For now, if you use the workflow for your analysis please cite it using the following doi: 10.1093/nar/gkad1021

Acknowledging nf-core

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. In addition, references of tools and data used in this pipeline are as follows:

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
bin		bin
conf		conf
modules		modules
test-data		test-data
tests		tests
workflows		workflows
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nf-test.config		nf-test.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scxa-tertiary-workflow

Introduction

Overview

How to run the workflow

Prepare the data

Requirements

High-performance computing

Running the workflow

For plate-based data:

For droplet-based data:

Running the workflow for SCEA

Output

Credits

Citations

Acknowledging nf-core

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

ebi-gene-expression-group/scxa-tertiary-workflow

Folders and files

Latest commit

History

Repository files navigation

scxa-tertiary-workflow

Introduction

Overview

How to run the workflow

Prepare the data

Requirements

High-performance computing

Running the workflow

For plate-based data:

For droplet-based data:

Running the workflow for SCEA

Output

Credits

Citations

Acknowledging nf-core

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages