You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@
24
24
25
25
## Introduction
26
26
27
-
**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, single-cell VDJ sequencing using the 10xGenomics libraries, or assembled reads (bulk or single-cell). It can also extract BCR and TCR sequences from bulk or single-cell untargeted RNAseq data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset as well as other AIRRseq analysis tools.
27
+
**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell receptor (BCR) or T-cell receptor (TCR) repertoire sequencing data. It allows the processing of targeted bulk and single-cell adaptive immune receptor sequencing data (AIRR-seq), as well as the extraction of TCR and BCR sequences from untargeted bulk and single-cell RNA-seq data. The pipeline enables and end-to-end analysis, departing from raw reads or readily assembled sequences, and performs sequence assembly, V(D)J assignment, clonal group inference, lineage reconstruction and repertoire analysis using the [Immcantation](https://immcantation.readthedocs.io/en/stable/) framework, as well as other immune repertoire analysis tools.
@@ -34,7 +34,7 @@ On release, automated continuous integration tests run the pipeline on a full-si
34
34
35
35
## Pipeline summary
36
36
37
-
nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single cell targeted sequencing data. Several protocols are supported, please see the [usage documentation](https://nf-co.re/airrflow/usage) for more details on the supported protocols. The pipeline has been certified as [AIRR compliant](https://docs.airr-community.org/en/stable/swtools/airr_swtools_compliant.html) by the AIRR community, which means that it is compatible with downstream analysis tools also supporting this format.
37
+
nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single cell targeted sequencing data, as well as extracting BCR and TCR sequences from bulk and single-cell RNA-seq datasets. Several protocols are supported, please see the [usage documentation](https://nf-co.re/airrflow/usage) for more details on the supported protocols. The pipeline has been certified as [AIRR compliant](https://docs.airr-community.org/en/stable/swtools/airr_swtools_compliant.html) by the AIRR community, which means that it is compatible with downstream analysis tools also supporting this format.
@@ -51,7 +51,7 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
51
51
- Assemble R1 and R2 read mates (`pRESTO AssemblePairs`).
52
52
- Remove and annotate read duplicates (`pRESTO CollapseSeq`).
53
53
- Filter out sequences that do not have at least 2 duplicates (`pRESTO SplitSeq`).
54
-
-single cell
54
+
-Single cell
55
55
- cellranger vdj
56
56
- Assemble contigs
57
57
- Annotate contigs
@@ -76,8 +76,8 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
76
76
- Single-cell QC filtering (`EnchantR`)
77
77
- Remove cells without heavy chains.
78
78
- Remove cells with multiple heavy chains.
79
-
- Remove sequences in different samples that share the same `cell_id` and nucleotide sequence.
80
-
- Modify `cell_id`s to ensure they are unique in the project.
79
+
- Remove sequences in different samples that share the same `cell_id` and nucleotide sequence, and thus are very likely contaminants.
80
+
- Modify `cell_id`s to ensure they are unique in each run.
81
81
82
82
4. Clonal analysis (bulk and single-cell)
83
83
@@ -93,20 +93,20 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
93
93
## Usage
94
94
95
95
> [!NOTE]
96
-
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
96
+
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. You will also need to install a container engine such as [Docker](https://docs.docker.com/engine/install/) or [Apptainer - formerly singularity -](https://apptainer.org/docs/admin/main/installation.html) prior to running the pipeline. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
97
97
98
98
First, ensure that the pipeline tests run on your infrastructure:
99
99
100
100
```bash
101
-
nextflow run nf-core/airrflow -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
101
+
nextflow run nf-core/airrflow -profile test,<docker/singularity/apptainer/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
102
102
```
103
103
104
-
To run nf-core/airrflow with your data, prepare a tab-separated samplesheet with your input data. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on bulk BCR / TCR sequencing data in fastq format looks as follows:
104
+
To run nf-core/airrflow with your data, you will need to first prepare a tab-separated samplesheet with the paths to your input data and necessary metadata to run the analysis. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on bulk BCR / TCR sequencing data departing from raw reads looks as follows:
Copy file name to clipboardExpand all lines: docs/usage.md
+36-12Lines changed: 36 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,9 @@
6
6
7
7
## Introduction
8
8
9
-
**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, single-cell VDJ sequencing using the 10xGenomics libraries, or assembled reads (bulk or single-cell). It can also extract BCR and TCR sequences from bulk or single-cell untargeted RNAseq data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset as well as other AIRRseq analysis tools.
9
+
The nf-core/airrflowpipeline allows processing B-cell receptor (BCR) and and T-cell receptor (TCR) sequencing data from bulk and single-cell sequencing protocols. It allows the processing of targeted bulk and single-cell adaptive immune receptor sequencing data (AIRR-seq), as well as the extraction of TCR and BCR sequences from untargeted bulk and single-cell RNA-seq data. The pipeline enables and end-to-end analysis, departing from raw reads or readily assembled sequences, and performs sequence assembly, V(D)J assignment, clonal group inference, lineage reconstruction and repertoire analysis using the [Immcantation](https://immcantation.readthedocs.io/en/stable/) framework, as well as other immune repertoire analysis tools.
10
10
11
-
In addition to this usage page, you can find useful usage information in these pages:
11
+
In addition to this page, you can find additional information on how to use the pipeline on the following pages:
12
12
13
13
-[bulk_tutorial](usage/bulk_tutorial.md): a step by step tutorial on how to run nf-core/airrflow for bulk data.
14
14
-[single_cell_tutorial](usage/single_cell_tutorial.md): a step by step tutorial on how to run nf-core/airrflow for single-cell data.
@@ -17,6 +17,15 @@ In addition to this usage page, you can find useful usage information in these p
17
17
18
18
### Quickstart
19
19
20
+
> [!INSTALLATION]
21
+
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set up Nextflow and a container engine needed to run this pipeline. At the moment, nf-core/airrflow does NOT support using conda virtual environments for dependency management, only containers are supported. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
22
+
23
+
First, ensure that the pipeline tests run on your infrastructure:
24
+
25
+
```bash
26
+
nextflow run nf-core/airrflow -profile test,<docker/singularity/apptainer/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
27
+
```
28
+
20
29
A typical command for running the pipeline for **bulk raw fastq files** using available pre-set protocol profiles is shown below. The full list of supported profiles can be found in the section [Supported protocol profiles](#supported-protocol-profiles).
21
30
22
31
```bash
@@ -63,8 +72,22 @@ nextflow run nf-core/airrflow \
63
72
--outdir results
64
73
```
65
74
66
-
Check the section [Input samplesheet](#input-samplesheet) below for instructions on how to create the samplesheet, and the [Supported library generation protocols](#supported-bulk-library-generation-methods-protocols) section below for examples on how to run the pipeline for different bulk and the 10xGenomics single cell sequencing protocol.
67
-
For more information about the parameters, please refer to the [parameters documentation](https://nf-co.re/airrflow/parameters).
75
+
It is also possible to reconstruct BCR and TCR sequences from untargeted bulk and single-cell sequencing data. A typical command to run the pipeline from **single-cell RNA-seq fastq files** is shown below. For more information, check the section on [supported untargeted RNA-seq based methods](#supported-untargeted-rna-seq-based-methods) below.
Check the section [Input samplesheet](#input-samplesheet) below for instructions on how to create the samplesheet, and the [Supported library generation protocols](#supported-bulk-library-generation-methods-protocols) section below for examples on how to run the pipeline for the different bulk and single-cell sequencing protocols.
90
+
For more detailed information about all the available parameters, please refer to the [parameters documentation](https://nf-co.re/airrflow/parameters).
68
91
The command above will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
69
92
70
93
Note that the pipeline will create the following files in your working directory:
@@ -154,7 +177,7 @@ Other optional columns can be added. These columns will be available as metadata
154
177
155
178
It is possible to provide several fastq files per sample (e.g. sequenced over different chips or lanes). In this case the different fastq files per sample will be merged together prior to processing. Provide one fastq pair R1/R2 per row, and the same `sample_id` field for these rows.
The required input file for processing raw BCR or TCR single cell targeted sequencing data is a sample sheet in TSV format (tab separated). The columns `sample_id`, `filename_R1`, `filename_R2`, `subject_id`, `species`, `tissue`, `pcr_target_locus`, `single_cell`, `sex`, `age` and `biomaterial_provider` are required. Any other columns you add will be available in the final repertoire file as extra metadata fields. You can refer to the bulk fastq input section for documentation on the individual columns.
160
183
An example samplesheet is:
@@ -175,7 +198,7 @@ An example samplesheet is:
175
198
176
199
It is possible to provide several fastq files per sample (e.g. sequenced over different chips or lanes). In this case the different fastq files per sample will be provided to the same cellranger process. These rows should then have an identical `sample_id` field.
177
200
178
-
### Fastq input samplesheet (untargeted bulk or sc RNA sequencing)
201
+
### Fastq input samplesheet (untargeted bulk or single-cell RNAseq)
179
202
180
203
When running the untargeted protocol, BCR or TCR sequences will be extracted from the untargeted bulk or single-cell RNA sequencing with tools such as [TRUST4](https://github.com/liulab-dfci/TRUST4).
181
204
The required input file is the same as for the [Fastq bulk AIRR samplesheet](#fastq-input-samplesheet-bulk-airr-sequencing) or [Fastq single-cell AIRR samplesheet](#fastq-input-samplesheet-single-cell-sequencing) depending on the input data type (bulk RNAseq or single-cell RNAseq).
@@ -303,7 +326,7 @@ nextflow run nf-core/airrflow -r <release> \
For common sequencing protocols such as commercial kits please check the section above if your kit has a preset profile first, as this will greatly simplify running the pipeline. When processing bulk sequencing data departing from raw `fastq` reads, several sequencing protocols are supported which can be provided with the parameter `--library_generation_method`.
309
332
The following table matches the library generation methods as described in the [AIRR metadata annotation guidelines](https://docs.airr-community.org/en/stable/miairr/metadata_guidelines.html#library-generation-method) to the value that can be provided to the `--library_generation_method` parameter.
@@ -456,7 +479,8 @@ When processing single cell sequencing data departing from raw `fastq` reads, cu
456
479
### 10xGenomics
457
480
458
481
This sequencing type requires setting `--library_generation_method sc_10x_genomics`.
459
-
The `cellranger vdj` automatically uses the Chromium cellular barcodes and UMIs to perform sequence assembly, paired clonotype calling and to assemble V(D)J transcripts per cell.
482
+
The `cellranger vdj` tool automatically uses the Chromium cell barcodes and UMIs to perform sequence assembly, paired clonotype calling and to assemble V(D)J transcripts per cell. The pipeline will then perform gene reassignment and clonotyping with the Immcantation framework unless otherwise specified.
483
+
460
484
Examples are provided below to run airrflow to process 10xGenomics raw FASTQ data.
461
485
462
486
```bash
@@ -476,10 +500,10 @@ nextflow run nf-core/airrflow -r dev \
476
500
- The 10xGenomics reference can be downloaded from the [download page](https://www.10xgenomics.com/support/software/cell-ranger/downloads)
477
501
- To generate a V(D)J segment fasta file as reference from IMGT one can follow the [cellranger docs](https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/advanced/references#imgt).
478
502
479
-
## Supported unselected RNA-seq based methods
503
+
## Supported untargeted RNA-seq based methods
480
504
481
-
nf-core/airrflow supports unselected bulk or single-cell RNA-seq fastq files as input. [TRUST4](https://github.com/liulab-dfci/TRUST4) is used to extract TCR/BCR sequences from these files. The resulting AIRR tables are then fed into airrflow's Immcantation based workflow. <br>
482
-
To use unselected RNA-seq based input, specify `--library_generation_method trust4`.
505
+
nf-core/airrflow supports untargeted bulk or single-cell RNA-seq fastq files as input. [TRUST4](https://github.com/liulab-dfci/TRUST4) is used to extract TCR/BCR sequences from these files. The resulting AIRR tables are then fed into airrflow's Immcantation based workflow. <br>
506
+
To use untargeted RNA-seq based input, specify `--library_generation_method trust4`.
483
507
484
508
### Bulk RNA-seq
485
509
@@ -555,7 +579,7 @@ Use this parameter to choose a configuration profile. Profiles can give configur
555
579
Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below.
556
580
557
581
> [!IMPORTANT]
558
-
> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
582
+
> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility. Conda is not supported for this pipeline.
559
583
560
584
The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to check if your system is supported, please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).
0 commit comments