Skip to content

New pipeline: nf-core/cfmultiomics #144

@LynnHenrotte

Description

@LynnHenrotte

Pipeline title/name

cfmultiomics

Keywords

cfDNA, fragmentomics, methylomics, genomics, multi-omics, bisulfite-sequencing, taps, illumina-5-base-solution, whole-genome-sequencing, targeted-sequencing

What is it about?

A cell-free DNA (cfDNA) analysis pipeline that streamlines pre-processing, quality control, alignment, and multimodal feature extraction from NGS data. Feature modalities include genomics, methylomics, and fragmentomics. A variety of NGS data types are supported, including whole-genome sequencing (WGS) or targeted sequencing data, bisulfite-sequencing data (BS-Seq), enzymatic methylation sequencing (EM-Seq) data, TET-assisted pyridine borane-sequencing (TAPS) data, and Illumina 5-base solution data. Multiple combinations of omics will be possible, and a fragmentomics-only workflow will be included as well.

Please provide a schematic diagram of the proposed pipeline

Image

What would a minimal first release of this pipeline include?

  1. Quality control with FastQC/MultiQC
  2. Alignment and methylation extraction with Bismark (for BS-Seq or EM-Seq data)
  3. Alignment with bwa-mem (for WGS, targeted sequencing, TAPS, or Illumina 5-base solution data)
  4. Methylation and variant calling with rastair (for TAPS or Illumina 5-base solution data)
  5. Somatic variant calling according to GATK4 best practices (for WGS or targeted sequencing data)
  6. Fragmentomic feature extraction (tools under evaluation).

Later releases may include more extensive fragmentomic feature extraction, more somatic variant callers (e.g. DeepSomatic), and integration of features across omics modalities.

I confirm my proposed pipeline will follow nf-core guidelines. Most importantly, my pipeline will:

  • be built with Nextflow.
  • pass nf-core lint tests and use standardized parameters.
  • be community-owned and developed within the nf-core organization.
  • open source under the MIT license with proper credits and acknowledgments.
  • have a descriptive, all lowercase, and without punctuation name.
  • use the nf-core pipeline template and predominantly use official nf-core modules.
  • focus on a specific data/analysis type with appropriate scope.
  • have properly maintained documentation.
  • be bundled using versioned Docker/Singularity containers.

Why do we need a new pipeline?

The analysis of cfDNA from blood samples (liquid biopsies) has gained significant interest in the past decade, especially in oncology. In particular, focus in the liquid biopsy field is shifting from 'single-omics' approaches towards the integration of different omics into a 'multi-omics' approach. Also, fragmentomics, the analysis of cfDNA fragmentation patterns such as fragment lengths and end motifs, is increasingly being used to complement standard methylation or mutation analyses on cfDNA. However, no nf-core pipeline currently exists that can perform a fragmentomics analysis on NGS data from cfDNA. Moreover, a multi-omics pipeline that allows integrating genomics, methylomics, and fragmentomics has not yet been established within nf-core either.

Who would be interested?

Researchers working on liquid biopsy applications in oncology and beyond.

What has been done so far

Using the nf-core/methylseq pipeline as a starting point, I integrated a custom module for fragment length computation.

URL to existing work (if applicable)

NA

Are there any similar existing nf-core pipelines?

methylseq, sarek

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions