ggabernet
diff --git a/‎CHANGELOG.md‎
Lines changed: 2 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 9 additions & 9 deletions b/‎README.md‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎assets/tutorial/airrflow.sh‎ ‎assets/single_cell_tutorial/airrflow.sh‎assets/tutorial/airrflow.sh renamed to assets/single_cell_tutorial/airrflow.sh b/‎assets/tutorial/airrflow.sh‎ ‎assets/single_cell_tutorial/airrflow.sh‎assets/tutorial/airrflow.sh renamed to assets/single_cell_tutorial/airrflow.sh
diff --git a/‎assets/tutorial/samplesheet.tsv‎ ‎…ets/single_cell_tutorial/samplesheet.tsv‎assets/tutorial/samplesheet.tsv renamed to assets/single_cell_tutorial/samplesheet.tsv b/‎assets/tutorial/samplesheet.tsv‎ ‎…ets/single_cell_tutorial/samplesheet.tsv‎assets/tutorial/samplesheet.tsv renamed to assets/single_cell_tutorial/samplesheet.tsv
diff --git a/‎conf/clontech_umi_bcr.config‎
Lines changed: 1 addition & 1 deletion b/‎conf/clontech_umi_bcr.config‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎conf/modules.config‎
Lines changed: 3 additions & 3 deletions b/‎conf/modules.config‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎conf/test_full.config‎
Lines changed: 5 additions & 5 deletions b/‎conf/test_full.config‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/usage.md‎
Lines changed: 36 additions & 12 deletions b/‎docs/usage.md‎
Lines changed: 36 additions & 12 deletions
@@ -24,7 +24,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
 ### `Fixed`
 
 - [#378](https://github.com/nf-core/airrflow/pull/378) Updated primer links to Immcantation GitHub.
-- [#384](https://github.com/nf-core/airrflow/pull/384) Temporarily downgrade nf-schema to fix compatibility with NXF > 25.03.1-edge
+- [#384](https://github.com/nf-core/airrflow/pull/384) Temporarily downgrade nf-schema to fix compatibility with NXF > 25.03.1-edge.
+- [#388](https://github.com/nf-core/airrflow/pull/388) Fixed some tutorials links and added info.
 
 ### `Dependencies`
 
 
@@ -24,7 +24,7 @@
 
 ## Introduction
 
-**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, single-cell VDJ sequencing using the 10xGenomics libraries, or assembled reads (bulk or single-cell). It can also extract BCR and TCR sequences from bulk or single-cell untargeted RNAseq data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset as well as other AIRRseq analysis tools.
+**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell receptor (BCR) or T-cell receptor (TCR) repertoire sequencing data. It allows the processing of targeted bulk and single-cell adaptive immune receptor sequencing data (AIRR-seq), as well as the extraction of TCR and BCR sequences from untargeted bulk and single-cell RNA-seq data. The pipeline enables and end-to-end analysis, departing from raw reads or readily assembled sequences, and performs sequence assembly, V(D)J assignment, clonal group inference, lineage reconstruction and repertoire analysis using the [Immcantation](https://immcantation.readthedocs.io/en/stable/) framework, as well as other immune repertoire analysis tools.
 
 ![nf-core/airrflow overview](docs/images/airrflow_workflow_overview.png)
 
@@ -34,7 +34,7 @@ On release, automated continuous integration tests run the pipeline on a full-si
 
 ## Pipeline summary
 
-nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single cell targeted sequencing data. Several protocols are supported, please see the [usage documentation](https://nf-co.re/airrflow/usage) for more details on the supported protocols. The pipeline has been certified as [AIRR compliant](https://docs.airr-community.org/en/stable/swtools/airr_swtools_compliant.html) by the AIRR community, which means that it is compatible with downstream analysis tools also supporting this format.
+nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single cell targeted sequencing data, as well as extracting BCR and TCR sequences from bulk and single-cell RNA-seq datasets. Several protocols are supported, please see the [usage documentation](https://nf-co.re/airrflow/usage) for more details on the supported protocols. The pipeline has been certified as [AIRR compliant](https://docs.airr-community.org/en/stable/swtools/airr_swtools_compliant.html) by the AIRR community, which means that it is compatible with downstream analysis tools also supporting this format.
 
 ![nf-core/airrflow overview](docs/images/metro-map-airrflow.png)
 
@@ -51,7 +51,7 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
   - Assemble R1 and R2 read mates (`pRESTO AssemblePairs`).
   - Remove and annotate read duplicates (`pRESTO CollapseSeq`).
   - Filter out sequences that do not have at least 2 duplicates (`pRESTO SplitSeq`).
-- single cell
+- Single cell
   - cellranger vdj
     - Assemble contigs
     - Annotate contigs
@@ -76,8 +76,8 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
 - Single-cell QC filtering (`EnchantR`)
   - Remove cells without heavy chains.
   - Remove cells with multiple heavy chains.
-  - Remove sequences in different samples that share the same `cell_id` and nucleotide sequence.
-  - Modify `cell_id`s to ensure they are unique in the project.
+  - Remove sequences in different samples that share the same `cell_id` and nucleotide sequence, and thus are very likely contaminants.
+  - Modify `cell_id`s to ensure they are unique in each run.
 
 4. Clonal analysis (bulk and single-cell)
 
@@ -93,20 +93,20 @@ nf-core/airrflow allows the end-to-end processing of BCR and TCR bulk and single
 ## Usage
 
 > [!NOTE]
-> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
+> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. You will also need to install a container engine such as [Docker](https://docs.docker.com/engine/install/) or [Apptainer - formerly singularity -](https://apptainer.org/docs/admin/main/installation.html) prior to running the pipeline. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
 
 First, ensure that the pipeline tests run on your infrastructure:
 
 ```bash
-nextflow run nf-core/airrflow -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
+nextflow run nf-core/airrflow -profile test,<docker/singularity/apptainer/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
 ```
 
-To run nf-core/airrflow with your data, prepare a tab-separated samplesheet with your input data. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on bulk BCR / TCR sequencing data in fastq format looks as follows:
+To run nf-core/airrflow with your data, you will need to first prepare a tab-separated samplesheet with the paths to your input data and necessary metadata to run the analysis. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on bulk BCR / TCR sequencing data departing from raw reads looks as follows:
 
 | sample_id | filename_R1                     | filename_R2                     | filename_I1                     | subject_id | species | pcr_target_locus | tissue | sex    | age | biomaterial_provider | single_cell | intervention   | collection_time_point_relative | cell_subset  |
 | --------- | ------------------------------- | ------------------------------- | ------------------------------- | ---------- | ------- | ---------------- | ------ | ------ | --- | -------------------- | ----------- | -------------- | ------------------------------ | ------------ |
 | sample01  | sample1_S8_L001_R1_001.fastq.gz | sample1_S8_L001_R2_001.fastq.gz | sample1_S8_L001_I1_001.fastq.gz | Subject02  | human   | IG               | blood  | NA     | 53  | sequencing_facility  | FALSE       | Drug_treatment | Baseline                       | plasmablasts |
-| sample02  | sample2_S8_L001_R1_001.fastq.gz | sample2_S8_L001_R2_001.fastq.gz | sample2_S8_L001_I1_001.fastq.gz | Subject02  | human   | TR               | blood  | female | 78  | sequencing_facility  | FALSE       | Drug_treatment | Baseline                       | plasmablasts |
+| sample02  | sample2_S8_L001_R1_001.fastq.gz | sample2_S8_L001_R2_001.fastq.gz | sample2_S8_L001_I1_001.fastq.gz | Subject02  | human   | IG               | blood  | female | 78  | sequencing_facility  | FALSE       | Drug_treatment | Baseline                       | plasmablasts |
 
 Each row represents a sample with fastq files (paired-end).
 
 
@@ -11,7 +11,7 @@
 */
 
 params {
-    config_profile_name        = 'Takara Bio / Clontech SMARTer v2'
+    config_profile_name        = 'Takara Bio / SMART-seq'
     config_profile_description = 'Profile to run pipeline for the Takara Bio / Clontech SMARTer v2 (UMI) BCR protocol profile'
 
     mode = 'fastq'
 
@@ -688,9 +688,9 @@ process {
             mode: params.publish_dir_mode,
             saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
         ]
-        ext.args = ['minseq':5,
-                    'traits':'isotype',
-                    'tips':'isotype']
+        ext.args = ['minseq': 3,
+                    'traits':'c_call',
+                    'tips':'c_call']
     }
 
     withName: AMULETY_TRANSLATE {
 
@@ -11,9 +11,9 @@
 */
 process {
     resourceLimits = [
-        cpus: 4,
-        memory: '15.GB',
-        time: '1.h'
+        cpus: 16,
+        memory: '60.GB',
+        time: '24.h'
     ]
 }
 
@@ -43,8 +43,8 @@ params {
 process {
     withName:DOWSER_LINEAGES{
         ext.args = ['minseq':5,
-                    'traits':'isotype',
-                    'tips':'isotype']
+                    'traits':'c_call',
+                    'tips':'c_call']
     }
 
     withName:DEFINE_CLONES_COMPUTE{
 
@@ -6,9 +6,9 @@
 
 ## Introduction
 
-**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, single-cell VDJ sequencing using the 10xGenomics libraries, or assembled reads (bulk or single-cell). It can also extract BCR and TCR sequences from bulk or single-cell untargeted RNAseq data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset as well as other AIRRseq analysis tools.
+The nf-core/airrflow pipeline allows processing B-cell receptor (BCR) and and T-cell receptor (TCR) sequencing data from bulk and single-cell sequencing protocols. It allows the processing of targeted bulk and single-cell adaptive immune receptor sequencing data (AIRR-seq), as well as the extraction of TCR and BCR sequences from untargeted bulk and single-cell RNA-seq data. The pipeline enables and end-to-end analysis, departing from raw reads or readily assembled sequences, and performs sequence assembly, V(D)J assignment, clonal group inference, lineage reconstruction and repertoire analysis using the [Immcantation](https://immcantation.readthedocs.io/en/stable/) framework, as well as other immune repertoire analysis tools.
 
-In addition to this usage page, you can find useful usage information in these pages:
+In addition to this page, you can find additional information on how to use the pipeline on the following pages:
 
 - [bulk_tutorial](usage/bulk_tutorial.md): a step by step tutorial on how to run nf-core/airrflow for bulk data.
 - [single_cell_tutorial](usage/single_cell_tutorial.md): a step by step tutorial on how to run nf-core/airrflow for single-cell data.
@@ -17,6 +17,15 @@ In addition to this usage page, you can find useful usage information in these p
 
 ### Quickstart
 
+> [!INSTALLATION]
+> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set up Nextflow and a container engine needed to run this pipeline. At the moment, nf-core/airrflow does NOT support using conda virtual environments for dependency management, only containers are supported. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
+
+First, ensure that the pipeline tests run on your infrastructure:
+
+```bash
+nextflow run nf-core/airrflow -profile test,<docker/singularity/apptainer/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
+```
+
 A typical command for running the pipeline for **bulk raw fastq files** using available pre-set protocol profiles is shown below. The full list of supported profiles can be found in the section [Supported protocol profiles](#supported-protocol-profiles).
 
 ```bash
@@ -63,8 +72,22 @@ nextflow run nf-core/airrflow \
 --outdir results
 ```
 
-Check the section [Input samplesheet](#input-samplesheet) below for instructions on how to create the samplesheet, and the [Supported library generation protocols](#supported-bulk-library-generation-methods-protocols) section below for examples on how to run the pipeline for different bulk and the 10xGenomics single cell sequencing protocol.
-For more information about the parameters, please refer to the [parameters documentation](https://nf-co.re/airrflow/parameters).
+It is also possible to reconstruct BCR and TCR sequences from untargeted bulk and single-cell sequencing data. A typical command to run the pipeline from **single-cell RNA-seq fastq files** is shown below. For more information, check the section on [supported untargeted RNA-seq based methods](#supported-untargeted-rna-seq-based-methods) below.
+
+```bash
+nextflow run nf-core/airrfow \
+-profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
+--mode fastq \
+--input input_samplesheet.tsv \
+--library_generation_method trust4 \
+--umi_read R1 \
+--cell_barcode_read R1 \
+--read_format bc:0:15,um:16:27 \
+--outdir results
+```
+
+Check the section [Input samplesheet](#input-samplesheet) below for instructions on how to create the samplesheet, and the [Supported library generation protocols](#supported-bulk-library-generation-methods-protocols) section below for examples on how to run the pipeline for the different bulk and single-cell sequencing protocols.
+For more detailed information about all the available parameters, please refer to the [parameters documentation](https://nf-co.re/airrflow/parameters).
 The command above will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
 
 Note that the pipeline will create the following files in your working directory:
@@ -154,7 +177,7 @@ Other optional columns can be added. These columns will be available as metadata
 
 It is possible to provide several fastq files per sample (e.g. sequenced over different chips or lanes). In this case the different fastq files per sample will be merged together prior to processing. Provide one fastq pair R1/R2 per row, and the same `sample_id` field for these rows.
 
-### Fastq input samplesheet (single cell sequencing)
+### Fastq input samplesheet (single-cell AIRR sequencing)
 
 The required input file for processing raw BCR or TCR single cell targeted sequencing data is a sample sheet in TSV format (tab separated). The columns `sample_id`, `filename_R1`, `filename_R2`, `subject_id`, `species`, `tissue`, `pcr_target_locus`, `single_cell`, `sex`, `age` and `biomaterial_provider` are required. Any other columns you add will be available in the final repertoire file as extra metadata fields. You can refer to the bulk fastq input section for documentation on the individual columns.
 An example samplesheet is:
@@ -175,7 +198,7 @@ An example samplesheet is:
 
 It is possible to provide several fastq files per sample (e.g. sequenced over different chips or lanes). In this case the different fastq files per sample will be provided to the same cellranger process. These rows should then have an identical `sample_id` field.
 
-### Fastq input samplesheet (untargeted bulk or sc RNA sequencing)
+### Fastq input samplesheet (untargeted bulk or single-cell RNAseq)
 
 When running the untargeted protocol, BCR or TCR sequences will be extracted from the untargeted bulk or single-cell RNA sequencing with tools such as [TRUST4](https://github.com/liulab-dfci/TRUST4).
 The required input file is the same as for the [Fastq bulk AIRR samplesheet](#fastq-input-samplesheet-bulk-airr-sequencing) or [Fastq single-cell AIRR samplesheet](#fastq-input-samplesheet-single-cell-sequencing) depending on the input data type (bulk RNAseq or single-cell RNAseq).
@@ -303,7 +326,7 @@ nextflow run nf-core/airrflow -r <release> \
 --outdir results
 ```
 
-## Supported bulk library generation methods (protocols)
+## Supported custom bulk library generation methods (protocols)
 
 For common sequencing protocols such as commercial kits please check the section above if your kit has a preset profile first, as this will greatly simplify running the pipeline. When processing bulk sequencing data departing from raw `fastq` reads, several sequencing protocols are supported which can be provided with the parameter `--library_generation_method`.
 The following table matches the library generation methods as described in the [AIRR metadata annotation guidelines](https://docs.airr-community.org/en/stable/miairr/metadata_guidelines.html#library-generation-method) to the value that can be provided to the `--library_generation_method` parameter.
@@ -456,7 +479,8 @@ When processing single cell sequencing data departing from raw `fastq` reads, cu
 ### 10xGenomics
 
 This sequencing type requires setting `--library_generation_method sc_10x_genomics`.
-The `cellranger vdj` automatically uses the Chromium cellular barcodes and UMIs to perform sequence assembly, paired clonotype calling and to assemble V(D)J transcripts per cell.
+The `cellranger vdj` tool automatically uses the Chromium cell barcodes and UMIs to perform sequence assembly, paired clonotype calling and to assemble V(D)J transcripts per cell. The pipeline will then perform gene reassignment and clonotyping with the Immcantation framework unless otherwise specified.
+
 Examples are provided below to run airrflow to process 10xGenomics raw FASTQ data.
 
 ```bash
@@ -476,10 +500,10 @@ nextflow run nf-core/airrflow -r dev \
 - The 10xGenomics reference can be downloaded from the [download page](https://www.10xgenomics.com/support/software/cell-ranger/downloads)
 - To generate a V(D)J segment fasta file as reference from IMGT one can follow the [cellranger docs](https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/advanced/references#imgt).
 
-## Supported unselected RNA-seq based methods
+## Supported untargeted RNA-seq based methods
 
-nf-core/airrflow supports unselected bulk or single-cell RNA-seq fastq files as input. [TRUST4](https://github.com/liulab-dfci/TRUST4) is used to extract TCR/BCR sequences from these files. The resulting AIRR tables are then fed into airrflow's Immcantation based workflow. <br>
-To use unselected RNA-seq based input, specify `--library_generation_method trust4`.
+nf-core/airrflow supports untargeted bulk or single-cell RNA-seq fastq files as input. [TRUST4](https://github.com/liulab-dfci/TRUST4) is used to extract TCR/BCR sequences from these files. The resulting AIRR tables are then fed into airrflow's Immcantation based workflow. <br>
+To use untargeted RNA-seq based input, specify `--library_generation_method trust4`.
 
 ### Bulk RNA-seq
 
@@ -555,7 +579,7 @@ Use this parameter to choose a configuration profile. Profiles can give configur
 Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below.
 
 > [!IMPORTANT]
-> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
+> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility. Conda is not supported for this pipeline.
 
 The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to check if your system is supported, please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).
Original file line number	Diff line number	Diff line change
`@@ -688,9 +688,9 @@ process {`
`688`	`688`	`mode: params.publish_dir_mode,`
`689`	`689`	`saveAs: { filename -> filename.equals('versions.yml') ? null : filename }`
`690`	`690`	`]`
`691`		`- ext.args = ['minseq':5,`
`692`		`- 'traits':'isotype',`
`693`		`- 'tips':'isotype']`
	`691`	`+ ext.args = ['minseq': 3,`
	`692`	`+ 'traits':'c_call',`
	`693`	`+ 'tips':'c_call']`
`694`	`694`	`}`
`695`	`695`
`696`	`696`	`withName: AMULETY_TRANSLATE {`