Skip to content

U-BDS/scnanoseq_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nf-core/scnanoseq validation

This repository contains the analytical code implemented for the validation of the nf-core/scnanoseq pipeline, developed by Austyn Trull and Dr. Lara Ianov.

Scope of analysis

The validation analysis of nf-core/scnanoseq (v1.1.0) was performed across the datasets outlined below.

The scope of the validation focused on performing a subset of tertiary analyses - including QC, filtering, normalization, integration (when applicable), clustering, barcode comparison, cell type identification, and marker evaluation at both the gene and transcript levels-on the outputs of nf-core/scnanoseq to assess its performance against ground-truth data. This analysis does not include other routine scRNA-seq analyses that extend beyond the scope of validation, such as pseudobulk differential gene expression analysis. Additionally, other analytical approaches such as doublet identification was excluded to minimize downstream transformations of the raw data generated by nf-core/scnanoseq. However, we recommend that users should incorporate these common analytical approaches in their own downstream workflows.

The benchmarking analysis (conducted on the 3' PBMC and Shiau et al. datasets) aimed to assess the performance of nf-core/scnanoseq across these datasets and evaluate the impact of parallelization strategies on computational efficiency.

Datasets

The validation and benchmarking data is derived from the following datasets:

  1. 10X Genomics and Oxford Nanopore released a 3' and 5' datasets (PBMC and lung) - PromethION and Illumina datasets


    In our analysis, raw FASTQ files from the PromethION runs were downloaded and used as input data for `scnanoseq` (v1.1.0). The outputs of `scnanoseq` were compared against the Illumina processed data (from Cell Ranger pipeline, used as ground-truth).
  2. BLAZE datasets: dataset derived from the BLAZE tool/method paper (cited below) which contains 1 PromethION sample and 2 GridION samples

    • ERR9958133 (GridION data generated with the Q20EA kit; read depth: )
    • ERR9958134 (GridION data generated with the LSK110 kit; read depth: )
    • ERR9958135 (PromethION data generated with the LSK110 kit; read depth: 61,967,455)

    You Y, Prawer YDJ, De Paoli-Iseppi R et al. Identification of cell barcodes from long-read single-cell rna-seq with blaze. Genome Biol 2023;24:66. https://doi.org/10.1186/s13059-023-02907-y

    Specifically this work provides three datasets, which were downloaded and used as input data for scnanoseq. The outputs of scnanoseq were compared against the authors original analysis (author's processed data and original downstream analysis information can be found at: https://github.com/youyupei/bc_whitelist_analysis). The pre-print reports the results from the PromethION sample (ERR9958135).

  3. Shiau et al. datasets (cited below) contained 6 samples with depths ranging from 67.4M - 105.3M reads. A custom whitelist (“737K-arc-v1.txt” from 10X Genomics) was used, in line with the authors' library preparation protocol.

    Shiau CK, Lu L, Kieser R et al. High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors. Nat Commun 2023;14:4124. https://doi.org/10.1038/s41467-023-39813-7

Docker Image

The Docker image containing the analytical dependencies for the tertiary analysis conducted in R is available at https://hub.docker.com/r/uabbds/scnanoseq_analysis. All Docker images were converted to Singularity images with processing performed at the UAB High Performance Computing cluster. Version 0.3.0 was the image implemented at the time of writting the pre-print.

Citation

If you use nf-core/scnanoseq, please cite our work as:

scnanoseq: an nf-core pipeline for Oxford Nanopore single-cell RNA-sequencing

Austyn Trull, Elizabeth A. Worthey, Lara Ianov

Bioinformatics. 2025 Sep 1;41(9):btaf487. doi: 10.1093/bioinformatics/btaf487. PMID: 40905625; PMCID: PMC12449243.

Slack channel

To discuss nf-core/scnanoseq please send us a message at the pipeline's nf-core Slack channel.

Support

We would also like to thank the following people and groups for their support, including financial support:

  • Dr. Elizabeth Worthey
  • University of Alabama at Birmingham Biological Data Science Core (U-BDS), RRID:SCR_021766, https://github.com/U-BDS
  • Civitan International Research Center
  • Support from: 3P30CA013148-48S8

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors