Skip to content

Commit c9f3735

Browse files
committed
Merge branch 'dev' of https://github.com/nf-core/airrflow into dev
2 parents 48bd9d0 + 05a9e23 commit c9f3735

12 files changed

Lines changed: 164 additions & 240 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
2424
### `Fixed`
2525

2626
- [#378](https://github.com/nf-core/airrflow/pull/378) Updated primer links to Immcantation GitHub.
27-
- [#384](https://github.com/nf-core/airrflow/pull/384) Temporarily downgrade nf-schema to fix compatibility with NXF > 25.03.1-edge
27+
- [#384](https://github.com/nf-core/airrflow/pull/384) Temporarily downgrade nf-schema to fix compatibility with NXF > 25.03.1-edge.
28+
- [#388](https://github.com/nf-core/airrflow/pull/388) Fixed some tutorials links and added info.
2829

2930
### `Dependencies`
3031

README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424

2525
## Introduction
2626

27-
**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, single-cell VDJ sequencing using the 10xGenomics libraries, or assembled reads (bulk or single-cell). It can also extract BCR and TCR sequences from bulk or single-cell untargeted RNAseq data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset as well as other AIRRseq analysis tools.
27+
**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell receptor (BCR) or T-cell receptor (TCR) repertoire sequencing data. It allows the processing of targeted bulk and single-cell adaptive immune receptor sequencing data (AIRR-seq), as well as the extraction of TCR and BCR sequences from untargeted bulk and single-cell RNA-seq data. The pipeline enables and end-to-end analysis, departing from raw reads or readily assembled sequences, and performs sequence assembly, V(D)J assignment, clonal group inference, lineage reconstruction and repertoire analysis using the [Immcantation](https://immcantation.readthedocs.io/en/stable/) framework, as well as other immune repertoire analysis tools.
2828

2929
![nf-core/airrflow overview](docs/images/airrflow_workflow_overview.png)
3030

@@ -34,7 +34,7 @@ On release, automated continuous integration tests run the pipeline on a full-si
3434

3535
## Pipeline summary
3636

37-
nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single cell targeted sequencing data. Several protocols are supported, please see the [usage documentation](https://nf-co.re/airrflow/usage) for more details on the supported protocols. The pipeline has been certified as [AIRR compliant](https://docs.airr-community.org/en/stable/swtools/airr_swtools_compliant.html) by the AIRR community, which means that it is compatible with downstream analysis tools also supporting this format.
37+
nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single cell targeted sequencing data, as well as extracting BCR and TCR sequences from bulk and single-cell RNA-seq datasets. Several protocols are supported, please see the [usage documentation](https://nf-co.re/airrflow/usage) for more details on the supported protocols. The pipeline has been certified as [AIRR compliant](https://docs.airr-community.org/en/stable/swtools/airr_swtools_compliant.html) by the AIRR community, which means that it is compatible with downstream analysis tools also supporting this format.
3838

3939
![nf-core/airrflow overview](docs/images/metro-map-airrflow.png)
4040

@@ -51,7 +51,7 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
5151
- Assemble R1 and R2 read mates (`pRESTO AssemblePairs`).
5252
- Remove and annotate read duplicates (`pRESTO CollapseSeq`).
5353
- Filter out sequences that do not have at least 2 duplicates (`pRESTO SplitSeq`).
54-
- single cell
54+
- Single cell
5555
- cellranger vdj
5656
- Assemble contigs
5757
- Annotate contigs
@@ -76,8 +76,8 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
7676
- Single-cell QC filtering (`EnchantR`)
7777
- Remove cells without heavy chains.
7878
- Remove cells with multiple heavy chains.
79-
- Remove sequences in different samples that share the same `cell_id` and nucleotide sequence.
80-
- Modify `cell_id`s to ensure they are unique in the project.
79+
- Remove sequences in different samples that share the same `cell_id` and nucleotide sequence, and thus are very likely contaminants.
80+
- Modify `cell_id`s to ensure they are unique in each run.
8181

8282
4. Clonal analysis (bulk and single-cell)
8383

@@ -93,20 +93,20 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
9393
## Usage
9494

9595
> [!NOTE]
96-
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
96+
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. You will also need to install a container engine such as [Docker](https://docs.docker.com/engine/install/) or [Apptainer - formerly singularity -](https://apptainer.org/docs/admin/main/installation.html) prior to running the pipeline. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
9797
9898
First, ensure that the pipeline tests run on your infrastructure:
9999

100100
```bash
101-
nextflow run nf-core/airrflow -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
101+
nextflow run nf-core/airrflow -profile test,<docker/singularity/apptainer/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
102102
```
103103

104-
To run nf-core/airrflow with your data, prepare a tab-separated samplesheet with your input data. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on bulk BCR / TCR sequencing data in fastq format looks as follows:
104+
To run nf-core/airrflow with your data, you will need to first prepare a tab-separated samplesheet with the paths to your input data and necessary metadata to run the analysis. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on bulk BCR / TCR sequencing data departing from raw reads looks as follows:
105105

106106
| sample_id | filename_R1 | filename_R2 | filename_I1 | subject_id | species | pcr_target_locus | tissue | sex | age | biomaterial_provider | single_cell | intervention | collection_time_point_relative | cell_subset |
107107
| --------- | ------------------------------- | ------------------------------- | ------------------------------- | ---------- | ------- | ---------------- | ------ | ------ | --- | -------------------- | ----------- | -------------- | ------------------------------ | ------------ |
108108
| sample01 | sample1_S8_L001_R1_001.fastq.gz | sample1_S8_L001_R2_001.fastq.gz | sample1_S8_L001_I1_001.fastq.gz | Subject02 | human | IG | blood | NA | 53 | sequencing_facility | FALSE | Drug_treatment | Baseline | plasmablasts |
109-
| sample02 | sample2_S8_L001_R1_001.fastq.gz | sample2_S8_L001_R2_001.fastq.gz | sample2_S8_L001_I1_001.fastq.gz | Subject02 | human | TR | blood | female | 78 | sequencing_facility | FALSE | Drug_treatment | Baseline | plasmablasts |
109+
| sample02 | sample2_S8_L001_R1_001.fastq.gz | sample2_S8_L001_R2_001.fastq.gz | sample2_S8_L001_I1_001.fastq.gz | Subject02 | human | IG | blood | female | 78 | sequencing_facility | FALSE | Drug_treatment | Baseline | plasmablasts |
110110

111111
Each row represents a sample with fastq files (paired-end).
112112

File renamed without changes.

conf/clontech_umi_bcr.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
*/
1212

1313
params {
14-
config_profile_name = 'Takara Bio / Clontech SMARTer v2'
14+
config_profile_name = 'Takara Bio / SMART-seq'
1515
config_profile_description = 'Profile to run pipeline for the Takara Bio / Clontech SMARTer v2 (UMI) BCR protocol profile'
1616

1717
mode = 'fastq'

conf/modules.config

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -688,9 +688,9 @@ process {
688688
mode: params.publish_dir_mode,
689689
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
690690
]
691-
ext.args = ['minseq':5,
692-
'traits':'isotype',
693-
'tips':'isotype']
691+
ext.args = ['minseq': 3,
692+
'traits':'c_call',
693+
'tips':'c_call']
694694
}
695695

696696
withName: AMULETY_TRANSLATE {

conf/test_full.config

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@
1111
*/
1212
process {
1313
resourceLimits = [
14-
cpus: 4,
15-
memory: '15.GB',
16-
time: '1.h'
14+
cpus: 16,
15+
memory: '60.GB',
16+
time: '24.h'
1717
]
1818
}
1919

@@ -43,8 +43,8 @@ params {
4343
process {
4444
withName:DOWSER_LINEAGES{
4545
ext.args = ['minseq':5,
46-
'traits':'isotype',
47-
'tips':'isotype']
46+
'traits':'c_call',
47+
'tips':'c_call']
4848
}
4949

5050
withName:DEFINE_CLONES_COMPUTE{

docs/usage.md

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66
77
## Introduction
88

9-
**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, single-cell VDJ sequencing using the 10xGenomics libraries, or assembled reads (bulk or single-cell). It can also extract BCR and TCR sequences from bulk or single-cell untargeted RNAseq data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset as well as other AIRRseq analysis tools.
9+
The nf-core/airrflow pipeline allows processing B-cell receptor (BCR) and and T-cell receptor (TCR) sequencing data from bulk and single-cell sequencing protocols. It allows the processing of targeted bulk and single-cell adaptive immune receptor sequencing data (AIRR-seq), as well as the extraction of TCR and BCR sequences from untargeted bulk and single-cell RNA-seq data. The pipeline enables and end-to-end analysis, departing from raw reads or readily assembled sequences, and performs sequence assembly, V(D)J assignment, clonal group inference, lineage reconstruction and repertoire analysis using the [Immcantation](https://immcantation.readthedocs.io/en/stable/) framework, as well as other immune repertoire analysis tools.
1010

11-
In addition to this usage page, you can find useful usage information in these pages:
11+
In addition to this page, you can find additional information on how to use the pipeline on the following pages:
1212

1313
- [bulk_tutorial](usage/bulk_tutorial.md): a step by step tutorial on how to run nf-core/airrflow for bulk data.
1414
- [single_cell_tutorial](usage/single_cell_tutorial.md): a step by step tutorial on how to run nf-core/airrflow for single-cell data.
@@ -17,6 +17,15 @@ In addition to this usage page, you can find useful usage information in these p
1717

1818
### Quickstart
1919

20+
> [!INSTALLATION]
21+
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set up Nextflow and a container engine needed to run this pipeline. At the moment, nf-core/airrflow does NOT support using conda virtual environments for dependency management, only containers are supported. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
22+
23+
First, ensure that the pipeline tests run on your infrastructure:
24+
25+
```bash
26+
nextflow run nf-core/airrflow -profile test,<docker/singularity/apptainer/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
27+
```
28+
2029
A typical command for running the pipeline for **bulk raw fastq files** using available pre-set protocol profiles is shown below. The full list of supported profiles can be found in the section [Supported protocol profiles](#supported-protocol-profiles).
2130

2231
```bash
@@ -63,8 +72,22 @@ nextflow run nf-core/airrflow \
6372
--outdir results
6473
```
6574

66-
Check the section [Input samplesheet](#input-samplesheet) below for instructions on how to create the samplesheet, and the [Supported library generation protocols](#supported-bulk-library-generation-methods-protocols) section below for examples on how to run the pipeline for different bulk and the 10xGenomics single cell sequencing protocol.
67-
For more information about the parameters, please refer to the [parameters documentation](https://nf-co.re/airrflow/parameters).
75+
It is also possible to reconstruct BCR and TCR sequences from untargeted bulk and single-cell sequencing data. A typical command to run the pipeline from **single-cell RNA-seq fastq files** is shown below. For more information, check the section on [supported untargeted RNA-seq based methods](#supported-untargeted-rna-seq-based-methods) below.
76+
77+
```bash
78+
nextflow run nf-core/airrfow \
79+
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
80+
--mode fastq \
81+
--input input_samplesheet.tsv \
82+
--library_generation_method trust4 \
83+
--umi_read R1 \
84+
--cell_barcode_read R1 \
85+
--read_format bc:0:15,um:16:27 \
86+
--outdir results
87+
```
88+
89+
Check the section [Input samplesheet](#input-samplesheet) below for instructions on how to create the samplesheet, and the [Supported library generation protocols](#supported-bulk-library-generation-methods-protocols) section below for examples on how to run the pipeline for the different bulk and single-cell sequencing protocols.
90+
For more detailed information about all the available parameters, please refer to the [parameters documentation](https://nf-co.re/airrflow/parameters).
6891
The command above will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
6992

7093
Note that the pipeline will create the following files in your working directory:
@@ -154,7 +177,7 @@ Other optional columns can be added. These columns will be available as metadata
154177

155178
It is possible to provide several fastq files per sample (e.g. sequenced over different chips or lanes). In this case the different fastq files per sample will be merged together prior to processing. Provide one fastq pair R1/R2 per row, and the same `sample_id` field for these rows.
156179

157-
### Fastq input samplesheet (single cell sequencing)
180+
### Fastq input samplesheet (single-cell AIRR sequencing)
158181

159182
The required input file for processing raw BCR or TCR single cell targeted sequencing data is a sample sheet in TSV format (tab separated). The columns `sample_id`, `filename_R1`, `filename_R2`, `subject_id`, `species`, `tissue`, `pcr_target_locus`, `single_cell`, `sex`, `age` and `biomaterial_provider` are required. Any other columns you add will be available in the final repertoire file as extra metadata fields. You can refer to the bulk fastq input section for documentation on the individual columns.
160183
An example samplesheet is:
@@ -175,7 +198,7 @@ An example samplesheet is:
175198
176199
It is possible to provide several fastq files per sample (e.g. sequenced over different chips or lanes). In this case the different fastq files per sample will be provided to the same cellranger process. These rows should then have an identical `sample_id` field.
177200

178-
### Fastq input samplesheet (untargeted bulk or sc RNA sequencing)
201+
### Fastq input samplesheet (untargeted bulk or single-cell RNAseq)
179202

180203
When running the untargeted protocol, BCR or TCR sequences will be extracted from the untargeted bulk or single-cell RNA sequencing with tools such as [TRUST4](https://github.com/liulab-dfci/TRUST4).
181204
The required input file is the same as for the [Fastq bulk AIRR samplesheet](#fastq-input-samplesheet-bulk-airr-sequencing) or [Fastq single-cell AIRR samplesheet](#fastq-input-samplesheet-single-cell-sequencing) depending on the input data type (bulk RNAseq or single-cell RNAseq).
@@ -303,7 +326,7 @@ nextflow run nf-core/airrflow -r <release> \
303326
--outdir results
304327
```
305328

306-
## Supported bulk library generation methods (protocols)
329+
## Supported custom bulk library generation methods (protocols)
307330

308331
For common sequencing protocols such as commercial kits please check the section above if your kit has a preset profile first, as this will greatly simplify running the pipeline. When processing bulk sequencing data departing from raw `fastq` reads, several sequencing protocols are supported which can be provided with the parameter `--library_generation_method`.
309332
The following table matches the library generation methods as described in the [AIRR metadata annotation guidelines](https://docs.airr-community.org/en/stable/miairr/metadata_guidelines.html#library-generation-method) to the value that can be provided to the `--library_generation_method` parameter.
@@ -456,7 +479,8 @@ When processing single cell sequencing data departing from raw `fastq` reads, cu
456479
### 10xGenomics
457480

458481
This sequencing type requires setting `--library_generation_method sc_10x_genomics`.
459-
The `cellranger vdj` automatically uses the Chromium cellular barcodes and UMIs to perform sequence assembly, paired clonotype calling and to assemble V(D)J transcripts per cell.
482+
The `cellranger vdj` tool automatically uses the Chromium cell barcodes and UMIs to perform sequence assembly, paired clonotype calling and to assemble V(D)J transcripts per cell. The pipeline will then perform gene reassignment and clonotyping with the Immcantation framework unless otherwise specified.
483+
460484
Examples are provided below to run airrflow to process 10xGenomics raw FASTQ data.
461485

462486
```bash
@@ -476,10 +500,10 @@ nextflow run nf-core/airrflow -r dev \
476500
- The 10xGenomics reference can be downloaded from the [download page](https://www.10xgenomics.com/support/software/cell-ranger/downloads)
477501
- To generate a V(D)J segment fasta file as reference from IMGT one can follow the [cellranger docs](https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/advanced/references#imgt).
478502

479-
## Supported unselected RNA-seq based methods
503+
## Supported untargeted RNA-seq based methods
480504

481-
nf-core/airrflow supports unselected bulk or single-cell RNA-seq fastq files as input. [TRUST4](https://github.com/liulab-dfci/TRUST4) is used to extract TCR/BCR sequences from these files. The resulting AIRR tables are then fed into airrflow's Immcantation based workflow. <br>
482-
To use unselected RNA-seq based input, specify `--library_generation_method trust4`.
505+
nf-core/airrflow supports untargeted bulk or single-cell RNA-seq fastq files as input. [TRUST4](https://github.com/liulab-dfci/TRUST4) is used to extract TCR/BCR sequences from these files. The resulting AIRR tables are then fed into airrflow's Immcantation based workflow. <br>
506+
To use untargeted RNA-seq based input, specify `--library_generation_method trust4`.
483507

484508
### Bulk RNA-seq
485509

@@ -555,7 +579,7 @@ Use this parameter to choose a configuration profile. Profiles can give configur
555579
Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below.
556580

557581
> [!IMPORTANT]
558-
> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
582+
> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility. Conda is not supported for this pipeline.
559583
560584
The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to check if your system is supported, please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).
561585

0 commit comments

Comments
 (0)