nf-core/scnanoseq validation

This repository contains the analytical code implemented for the validation of the nf-core/scnanoseq pipeline, developed by Austyn Trull and Dr. Lara Ianov.

Scope of analysis

The validation analysis of nf-core/scnanoseq (v1.1.0) was performed across the datasets outlined below.

The scope of the validation focused on performing a subset of tertiary analyses - including QC, filtering, normalization, integration (when applicable), clustering, barcode comparison, cell type identification, and marker evaluation at both the gene and transcript levels-on the outputs of nf-core/scnanoseq to assess its performance against ground-truth data. This analysis does not include other routine scRNA-seq analyses that extend beyond the scope of validation, such as pseudobulk differential gene expression analysis. Additionally, other analytical approaches such as doublet identification was excluded to minimize downstream transformations of the raw data generated by nf-core/scnanoseq. However, we recommend that users should incorporate these common analytical approaches in their own downstream workflows.

The benchmarking analysis (conducted on the 3' PBMC and Shiau et al. datasets) aimed to assess the performance of nf-core/scnanoseq across these datasets and evaluate the impact of parallelization strategies on computational efficiency.

Datasets

The validation and benchmarking data is derived from the following datasets:

10X Genomics and Oxford Nanopore released a 3' and 5' datasets (PBMC and lung) - PromethION and Illumina datasets
- 3' PBMC dataset (read depth: 129,264,682): https://www.10xgenomics.com/datasets/5k-human-pbmcs-3-v3-1-chromium-controller-3-1-standard
- 5' lung dataset (read depth: 106,105,266): https://www.10xgenomics.com/datasets/3k-human-squamous-cell-lung-carcinoma-dtcs-chromium-x-2-standard
- Application note: https://www.10xgenomics.com/library/dea066
In our analysis, raw FASTQ files from the PromethION runs were downloaded and used as input data for `scnanoseq` (v1.1.0). The outputs of `scnanoseq` were compared against the Illumina processed data (from Cell Ranger pipeline, used as ground-truth).
BLAZE datasets: dataset derived from the BLAZE tool/method paper (cited below) which contains 1 PromethION sample and 2 GridION samples
- ERR9958133 (GridION data generated with the Q20EA kit; read depth: )
- ERR9958134 (GridION data generated with the LSK110 kit; read depth: )
- ERR9958135 (PromethION data generated with the LSK110 kit; read depth: 61,967,455)
You Y, Prawer YDJ, De Paoli-Iseppi R et al. Identification of cell barcodes from long-read single-cell rna-seq with blaze. Genome Biol 2023;24:66. https://doi.org/10.1186/s13059-023-02907-y

Specifically this work provides three datasets, which were downloaded and used as input data for scnanoseq. The outputs of scnanoseq were compared against the authors original analysis (author's processed data and original downstream analysis information can be found at: https://github.com/youyupei/bc_whitelist_analysis). The pre-print reports the results from the PromethION sample (ERR9958135).
Shiau et al. datasets (cited below) contained 6 samples with depths ranging from 67.4M - 105.3M reads. A custom whitelist (“737K-arc-v1.txt” from 10X Genomics) was used, in line with the authors' library preparation protocol.

Shiau CK, Lu L, Kieser R et al. High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors. Nat Commun 2023;14:4124. https://doi.org/10.1038/s41467-023-39813-7

Docker Image

The Docker image containing the analytical dependencies for the tertiary analysis conducted in R is available at https://hub.docker.com/r/uabbds/scnanoseq_analysis. All Docker images were converted to Singularity images with processing performed at the UAB High Performance Computing cluster. Version 0.3.0 was the image implemented at the time of writting the pre-print.

Citation

If you use nf-core/scnanoseq, please cite our work as:

scnanoseq: an nf-core pipeline for Oxford Nanopore single-cell RNA-sequencing

Austyn Trull, Elizabeth A. Worthey, Lara Ianov

Bioinformatics. 2025 Sep 1;41(9):btaf487. doi: 10.1093/bioinformatics/btaf487. PMID: 40905625; PMCID: PMC12449243.

Slack channel

To discuss nf-core/scnanoseq please send us a message at the pipeline's nf-core Slack channel.

Support

We would also like to thank the following people and groups for their support, including financial support:

Dr. Elizabeth Worthey
University of Alabama at Birmingham Biological Data Science Core (U-BDS), RRID:SCR_021766, https://github.com/U-BDS
Civitan International Research Center
Support from: 3P30CA013148-48S8

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
barcode_analysis		barcode_analysis
secondary_analysis		secondary_analysis
tertiary_analysis		tertiary_analysis
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nf-core/scnanoseq validation

Scope of analysis

Datasets

Docker Image

Citation

Slack channel

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nf-core/scnanoseq validation

Scope of analysis

Datasets

Docker Image

Citation

Slack channel

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages