This repository contains the analytical code implemented for the validation of the nf-core/scnanoseq pipeline, developed by Austyn Trull and Dr. Lara Ianov.
The validation analysis of nf-core/scnanoseq (v1.1.0) was performed across the datasets outlined below.
The scope of the validation focused on performing a subset of tertiary analyses - including QC, filtering, normalization, integration (when applicable), clustering, barcode comparison, cell type identification, and marker evaluation at both the gene and transcript levels-on the outputs of nf-core/scnanoseq to assess its performance against ground-truth data. This analysis does not include other routine scRNA-seq analyses that extend beyond the scope of validation, such as pseudobulk differential gene expression analysis. Additionally, other analytical approaches such as doublet identification was excluded to minimize downstream transformations of the raw data generated by nf-core/scnanoseq. However, we recommend that users should incorporate these common analytical approaches in their own downstream workflows.
The benchmarking analysis (conducted on the 3' PBMC and Shiau et al. datasets) aimed to assess the performance of nf-core/scnanoseq across these datasets and evaluate the impact of parallelization strategies on computational efficiency.
The validation and benchmarking data is derived from the following datasets:
-
10X Genomics and Oxford Nanopore released a 3' and 5' datasets (PBMC and lung) - PromethION and Illumina datasets
- 3' PBMC dataset (read depth: 129,264,682): https://www.10xgenomics.com/datasets/5k-human-pbmcs-3-v3-1-chromium-controller-3-1-standard
- 5' lung dataset (read depth: 106,105,266): https://www.10xgenomics.com/datasets/3k-human-squamous-cell-lung-carcinoma-dtcs-chromium-x-2-standard
- Application note: https://www.10xgenomics.com/library/dea066
In our analysis, raw FASTQ files from the PromethION runs were downloaded and used as input data for `scnanoseq` (v1.1.0). The outputs of `scnanoseq` were compared against the Illumina processed data (from Cell Ranger pipeline, used as ground-truth). -
BLAZE datasets: dataset derived from the BLAZE tool/method paper (cited below) which contains 1 PromethION sample and 2 GridION samples
- ERR9958133 (GridION data generated with the Q20EA kit; read depth: )
- ERR9958134 (GridION data generated with the LSK110 kit; read depth: )
- ERR9958135 (PromethION data generated with the LSK110 kit; read depth: 61,967,455)
You Y, Prawer YDJ, De Paoli-Iseppi R et al. Identification of cell barcodes from long-read single-cell rna-seq with blaze. Genome Biol 2023;24:66. https://doi.org/10.1186/s13059-023-02907-y
Specifically this work provides three datasets, which were downloaded and used as input data for
scnanoseq. The outputs ofscnanoseqwere compared against the authors original analysis (author's processed data and original downstream analysis information can be found at: https://github.com/youyupei/bc_whitelist_analysis). The pre-print reports the results from the PromethION sample (ERR9958135). -
Shiau et al. datasets (cited below) contained 6 samples with depths ranging from 67.4M - 105.3M reads. A custom
whitelist(“737K-arc-v1.txt” from 10X Genomics) was used, in line with the authors' library preparation protocol.Shiau CK, Lu L, Kieser R et al. High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors. Nat Commun 2023;14:4124. https://doi.org/10.1038/s41467-023-39813-7
The Docker image containing the analytical dependencies for the tertiary analysis conducted in R is available at https://hub.docker.com/r/uabbds/scnanoseq_analysis. All Docker images were converted to Singularity images with processing performed at the UAB High Performance Computing cluster. Version 0.3.0 was the image implemented at the time of writting the pre-print.
If you use nf-core/scnanoseq, please cite our work as:
scnanoseq: an nf-core pipeline for Oxford Nanopore single-cell RNA-sequencing
Austyn Trull, Elizabeth A. Worthey, Lara Ianov
Bioinformatics. 2025 Sep 1;41(9):btaf487. doi: 10.1093/bioinformatics/btaf487. PMID: 40905625; PMCID: PMC12449243.
To discuss nf-core/scnanoseq please send us a message at the pipeline's nf-core Slack channel.
We would also like to thank the following people and groups for their support, including financial support:
- Dr. Elizabeth Worthey
- University of Alabama at Birmingham Biological Data Science Core (U-BDS), RRID:SCR_021766, https://github.com/U-BDS
- Civitan International Research Center
- Support from: 3P30CA013148-48S8