Skip to content

Commit 2c72754

Browse files
authored
Merge pull request #554 from nf-core/cellranger-gtf-space-incompatibility
Cellranger gtf space incompatibility
2 parents ac8d113 + 38362a6 commit 2c72754

19 files changed

Lines changed: 707 additions & 32 deletions

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Features
1111

12+
- Add `PREPARE_GENOME` subworkflow to bundle reference FASTA/GTF preparation (gunzip, gene filter, optional GTF source-column fix for Cell Ranger 10) ([#554](https://github.com/nf-core/scrnaseq/pull/554))
1213
- Address [#512](https://github.com/nf-core/scrnaseq/issues/512), adding early validation of the cellranger multi barcode sheet ([#513](https://github.com/nf-core/scrnaseq/pull/513))
1314
- Update `nf-core/cellranger` modules to Cell Ranger `10.0.0`, including output channel handling for multiplexed experiments ([#508](https://github.com/nf-core/scrnaseq/pull/508))
1415
- Replace **alevinqc** with **qcatch** for simpleaf QC; add `--skip_qcatch` parameter ([#520](https://github.com/nf-core/scrnaseq/pull/520))
@@ -25,6 +26,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2526

2627
### Fixes
2728

29+
- Fix Cell Ranger `mkref` failure with NCBI iGenomes `GRCh38` GTF (`Curated Genomic` in the GTF source column) by replacing spaces in column 2 when using Cell Ranger aligners ([#554](https://github.com/nf-core/scrnaseq/pull/554))
2830
- Fix [#375](https://github.com/nf-core/scrnaseq/issues/375): mismatch between index and probeset when cellranger multi is used without a prebuilt index and an FFPE probeset is passed ([#502](https://github.com/nf-core/scrnaseq/pull/502))
2931
- Fix [#510](https://github.com/nf-core/scrnaseq/issues/510): Handle files with BOMs. ([#511](https://github.com/nf-core/scrnaseq/pull/511))
3032
- Fix [#539](https://github.com/nf-core/scrnaseq/issues/539) and [#393](https://github.com/nf-core/scrnaseq/issues/393): genome and aligner index parameter handling by resolving iGenomes attributes in the entry workflow and passing reference paths explicitly into `SCRNASEQ`, restoring configurable pre-built indexes via custom `igenomes` configs (reverts [#483](https://github.com/nf-core/scrnaseq/pull/483) pipeline-wide params approach) ([#545](https://github.com/nf-core/scrnaseq/pull/545))

conf/igenomes.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ params {
3232
star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/"
3333
star_legacy = true
3434
bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/"
35+
gtf_source_has_spaces = true
3536
gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf"
3637
bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed"
3738
mito_name = "chrM"

conf/modules.config

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,13 @@ process {
7070
]
7171
}
7272

73+
withName: GTF_SOURCE_FIX {
74+
ext.suffix = 'gtf'
75+
ext.prefix = { meta.id }
76+
ext.args2 = "'BEGIN { FS = OFS = \"\\t\" } /^#/ { print; next } NF >= 2 { gsub(/ /, \"_\", \$2) } { print }'"
77+
publishDir = [ enabled: false ]
78+
}
79+
7380
withName: CELLRANGER_MKGTF {
7481
publishDir = [
7582
path: "${params.outdir}/${params.aligner}/mkgtf",

docs/usage.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -384,6 +384,8 @@ The pipeline can resolve reference files from `conf/igenomes.config` when you pr
384384

385385
Some AWS iGenomes STAR indices were generated with older STAR versions and contain legacy metadata. nf-core/scrnaseq includes a compatibility step for these configured iGenomes entries so that legacy STAR indices can run with the STAR version shipped by the pipeline. This support is intended to keep existing iGenomes usage working, not to make legacy indices the preferred reference for new analyses.
386386

387+
Some AWS iGenomes GTF files, such as the NCBI `GRCh38` annotation, contain spaces in the GTF source column (for example `Curated Genomic`). Cell Ranger 10 `mkref` rejects spaces in that field. When using Cell Ranger aligners (`cellranger`, `cellrangerarc`, or `cellrangermulti`) with a configured iGenomes entry flagged for this issue, nf-core/scrnaseq automatically replaces spaces in the source column before reference building. This keeps existing iGenomes usage working with Cell Ranger 10, but is not intended as the preferred reference for new analyses.
388+
387389
> [!WARNING]
388390
> For production runs, we recommend building fresh indices from current reference files instead of relying on legacy AWS iGenomes indices. The nf-core [reference genome documentation](https://nf-co.re/docs/running/reference-genomes) warns that AWS iGenomes annotations are significantly outdated, for example human annotations from Ensembl release 75, and that GRCh38 iGenomes uses the NCBI assembly rather than the masked Ensembl assembly.
389391
@@ -400,6 +402,18 @@ nextflow run nf-core/scrnaseq \
400402
-profile docker
401403
```
402404

405+
To build a Cell Ranger reference from current annotation files instead of iGenomes, provide FASTA and GTF files directly:
406+
407+
```bash
408+
nextflow run nf-core/scrnaseq \
409+
--input samplesheet.csv \
410+
--outdir results \
411+
--aligner cellranger \
412+
--fasta reference.fa.gz \
413+
--gtf annotation.gtf.gz \
414+
-profile docker
415+
```
416+
403417
## Running the pipeline
404418

405419
The minimum typical command for running the pipeline is as follows:

modules.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,11 @@
6060
"git_sha": "6d46786420b4d7bc88eba026eb389c0c5535d120",
6161
"installed_by": ["modules"]
6262
},
63+
"gawk": {
64+
"branch": "master",
65+
"git_sha": "6d46786420b4d7bc88eba026eb389c0c5535d120",
66+
"installed_by": ["modules"]
67+
},
6368
"gffread": {
6469
"branch": "master",
6570
"git_sha": "c9ad4d691aa339e478a77847e3ef854ccd21778b",

modules/nf-core/gawk/environment.yml

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/gawk/main.nf

Lines changed: 60 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/gawk/meta.yml

Lines changed: 84 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)