-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Pipeline title/name
eisca
Keywords
single-cell, scRNA-seq, 10x, Smart-seq2, scvi-tools
What is it about?
EISCA is a bioinformatics pipeline that performs analysis of single-cell RNA-seq data. It was developed as a generalized, flexible, and scalable workflow for scRNA-seq analysis. The pipeline is primarily designed for droplet-based (10x) and plate-based (Smart-seq2) data. The pipeline consists of three analysis phases. The primary phase includes raw data QC, mapping and quantification, and converting count matrix into Anndata. The secondary phase focuses on single-cell QC, cell filtering, clustering analysis with integration. The tertiary phase involves downstream analyses, such as cell type annotation, differential expression analysis, cellular interaction analysis. Users can run the pipeline end-to-end or execute each analysis phase independently.
Please provide a schematic diagram of the proposed pipeline
I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:
- be built with Nextflow.
- pass nf-core lint tests and use standardized parameters.
- be community-owned and developed within the nf-core organization.
- open source under the MIT license with proper credits and acknowledgments.
- have a descriptive, all lowercase, and without punctuation name.
- use the nf-core pipeline template and predominantly use official nf-core modules.
- focus on a specific data/analysis type with appropriate scope.
- have properly maintained documentation.
- be bundled using versioned Docker/Singularity containers.
Why do we need a new pipeline?
This is full-spectrum analysis pipeline for single-cell data from preprocessing the raw data to filtering, clustering and downstream analyses. This pipeline support multiple starting points, allowing users to begin from raw FASTQ files or processed single-cell Anndata for different analysis phases. Users can tweak parameters and re-run specific analysis to gain better insights. This flexibility enables users to quickly perform preliminary analyses and obtain an out-of-the-box analysis report showing QC metrics and plots. This facilitates rapid assessment of data quality, and supports a smooth transition to more complex, objective-specific downstream analyses. Additionally, the pipeline integrates scvi-tools for data integration, annotation, and differential expression analysis by applying pretrained models or training custom models.
Who would be interested?
Researchers and bioinformaticians working with single-cell RNA-seq data, especially those needing a flexible, scalable, and end-to-end analysis workflow would be most interested in EISCA. This includes labs analyzing 10x or Smart-seq2 data, computational biologists performing QC, clustering, or integration, and teams looking for reproducible pipelines with scvi-tools support.
What has been done so far
The pipeline has been under active development for about one year and has reached version 2.5. It has been tested extensively within our institute, where we have validated its performance across multiple datasets and refined its features based on real analysis needs.
URL to existing work (if applicable)
https://github.com/EarlhamInst/eisca
Are there any similar existing nf-core pipelines?
scrnaseq, scdownstream
