Skip to content

Commit 855a8e3

Browse files
pinin4fjordsclaude
authored andcommitted
feat(gedi): add gedi/indexgenome and gedi/price modules (nf-core#11693)
* feat(gedi): add gedi/indexgenome and gedi/price modules Adds two modules wrapping the GEDI / PRICE toolkit (`bioconda::gedi=1.0.6a`) for Ribo-seq translated-ORF discovery. PRICE (Erhard et al. 2018, doi:10.1038/nmeth.4631) calls translated ORFs from ribosome profiling data with near-cognate start codon detection. `gedi/indexgenome` wraps `gedi -e IndexGenome`, producing the `.oml` genome index directory consumed by PRICE. `gedi/price` wraps `bamlist2cit` + `gedi -e Price`, taking a cohort of Ribo-seq BAMs plus the genome index and emitting ORF predictions (`*.orfs.tsv` + `*.cit` + sidecars). One-shot across the cohort - PRICE is not per-sample. Both modules use Wave-built community containers from `bioconda::gedi=1.0.6a`. The bioconda recipe was merged 2026-05-16; using Wave directly for now. Source: nf-core/riboseq#174. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(gedi/indexgenome): use ${prefix} for the index output directory Previously hard-coded the output directory as `price_index`. Switching to `${prefix}` (default `${meta.id}`, overridable via `task.ext.prefix`) lets callers control the directory name and matches the nf-core convention for publishable directory outputs. The default ${meta.id} keeps the directory keyed to the reference id, so when `gedi/price` opens `${index}/${meta2.id}.oml`, the lookup still resolves provided meta ids match (already the case in the test chain). Snapshot regenerated: the index directory name in the output snapshot changes from `price_index` to the test's `homo_sapiens_chr20` (its meta.id). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gedi/price): add real test using minimised chr19+chr22 fixtures Replaces the stub-only PRICE test with an end-to-end test that runs PRICE on a minimal cohort of four Ribo-seq samples (chr19+chr22, protein-coding-only reference). The cohort produces 380 ORF calls; snapshot captures the orfs.tsv line count for stability validation. Fixtures published in nf-core/test-datasets PR nf-core#2061. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gedi/indexgenome): update meta.yml output name after ${prefix} refactor The earlier `${prefix}` refactor (commit 0ca4c45) changed the index output declaration from `path("price_index")` to `path("${prefix}")`, but the meta.yml output entry still hard-coded `price_index` — causing CI lint to flag `correct_meta_outputs: Module meta.yml does not match main.nf`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(gedi/indexgenome): collapse leftover alignment padding on index emit After the `${prefix}` refactor (commit 0ca4c45) the index output line was the only `tuple val(meta), path(...)` emit in the module, so the 52-space alignment padding it kept from when the path was `price_index` no longer aligns with anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gedi): correct licence (GPL-3.0) and Gedi description in meta.yml Two cross-cutting fixes from review of nf-core#11693: - Licence was Apache-2.0 in both meta.yml files; the upstream repo erhard-lab/gedi is GPL-3.0. Corrected. - "GEDI (Gene Expression Data Integration)" was unverified — the upstream README/wiki/paper don't expand the acronym that way. Replaced with the upstream one-liner phrasing. PRICE meta.yml also adds the verified PRICE expansion (Probabilistic Inference of Codon Activities by an EM algorithm) from the GEDI wiki. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gedi/price): point fixtures at nf-core/test-datasets@modules nf-core/test-datasets#2061 merged; fixtures now live on the modules branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8266d2a commit 855a8e3

11 files changed

Lines changed: 833 additions & 0 deletions

File tree

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
channels:
2+
- bioconda
3+
- conda-forge
4+
dependencies:
5+
- bioconda::gedi=1.0.6a
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
process GEDI_INDEXGENOME {
2+
tag "$meta.id"
3+
label 'process_medium'
4+
5+
conda "${moduleDir}/environment.yml"
6+
container "${ workflow.containerEngine in ['singularity', 'apptainer'] && !task.ext.singularity_pull_docker_container ?
7+
'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/ba/bae29fa913dea79a3dcdbfbf544f0391f82bbfdbf3e6430f71db45ba21d6cf79/data' :
8+
'community.wave.seqera.io/library/gedi_indexgenome:cfca16738f306c86' }"
9+
10+
input:
11+
tuple val(meta), path(fasta), path(gtf)
12+
13+
output:
14+
tuple val(meta), path("${prefix}"), emit: index
15+
tuple val("${task.process}"), val('gedi'), eval("gedi -e Version 2>&1 | sed -n 's/.*Gedi version \\([^ ]*\\).*/\\1/p' | head -n 1"), topic: versions, emit: versions_gedi
16+
17+
when:
18+
task.ext.when == null || task.ext.when
19+
20+
script:
21+
def args = task.ext.args ?: ''
22+
def name = meta.id ?: 'reference'
23+
prefix = task.ext.prefix ?: "${meta.id}"
24+
"""
25+
mkdir -p ${prefix}
26+
gedi -e IndexGenome \\
27+
-s ${fasta} \\
28+
-a ${gtf} \\
29+
-n ${name} \\
30+
-f ${prefix} \\
31+
-o ${prefix}/${name}.oml \\
32+
-nomapping \\
33+
-p \\
34+
${args}
35+
"""
36+
37+
stub:
38+
def name = meta.id ?: 'reference'
39+
prefix = task.ext.prefix ?: "${meta.id}"
40+
"""
41+
mkdir -p ${prefix}
42+
touch ${prefix}/${name}.oml
43+
"""
44+
}
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
name: "gedi_indexgenome"
2+
description: Build a GEDI genome index from a FASTA and GTF for downstream PRICE ORF prediction
3+
keywords:
4+
- riboseq
5+
- index
6+
- genome
7+
- gedi
8+
- price
9+
- orf
10+
tools:
11+
- "gedi":
12+
description: "Gedi is a Java software platform for working with genomic data
13+
(sequencing reads, sequences, per-base numeric values, annotations). It
14+
provides the PRICE algorithm for ribosome profiling ORF discovery."
15+
homepage: "https://github.com/erhard-lab/gedi"
16+
documentation: "https://github.com/erhard-lab/gedi/wiki"
17+
tool_dev_url: "https://github.com/erhard-lab/gedi"
18+
doi: "10.1038/nmeth.4631"
19+
licence:
20+
- "GPL-3.0"
21+
identifier: ""
22+
input:
23+
- - meta:
24+
type: map
25+
description: |
26+
Groovy Map containing reference information
27+
e.g. `[ id:'homo_sapiens' ]`
28+
- fasta:
29+
type: file
30+
description: Genome FASTA file
31+
pattern: "*.{fa,fasta,fna}"
32+
ontologies:
33+
- edam: http://edamontology.org/format_1929
34+
- gtf:
35+
type: file
36+
description: GTF annotation file
37+
pattern: "*.{gtf}"
38+
ontologies:
39+
- edam: http://edamontology.org/format_2306
40+
output:
41+
index:
42+
- - meta:
43+
type: map
44+
description: |
45+
Groovy Map containing reference information
46+
e.g. `[ id:'homo_sapiens' ]`
47+
- ${prefix}:
48+
type: directory
49+
description: |
50+
GEDI genome index directory containing the `.oml` index file and
51+
its sidecar resources. Consumed as `-genomic` by PRICE. Directory
52+
name defaults to `meta.id` and can be overridden via `task.ext.prefix`.
53+
pattern: "*"
54+
versions_gedi:
55+
- - ${task.process}:
56+
type: string
57+
description: The name of the process
58+
- gedi:
59+
type: string
60+
description: The name of the tool
61+
- gedi -e Version 2>&1 | sed -n 's/.*Gedi version \([^ ]*\).*/\1/p' | head -n 1:
62+
type: eval
63+
description: The expression to obtain the version of the tool
64+
topics:
65+
versions:
66+
- - ${task.process}:
67+
type: string
68+
description: The name of the process
69+
- gedi:
70+
type: string
71+
description: The name of the tool
72+
- gedi -e Version 2>&1 | sed -n 's/.*Gedi version \([^ ]*\).*/\1/p' | head -n 1:
73+
type: eval
74+
description: The expression to obtain the version of the tool
75+
authors:
76+
- "@pinin4fjords"
77+
maintainers:
78+
- "@pinin4fjords"
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
nextflow_process {
2+
3+
name "Test Process GEDI_INDEXGENOME"
4+
script "../main.nf"
5+
process "GEDI_INDEXGENOME"
6+
7+
tag "gunzip"
8+
tag "modules"
9+
tag "modules_nfcore"
10+
tag "gedi"
11+
tag "gedi/indexgenome"
12+
13+
setup {
14+
run("GUNZIP") {
15+
script "modules/nf-core/gunzip/main.nf"
16+
process {
17+
"""
18+
input[0] = [
19+
[ ],
20+
file(params.modules_testdata_base_path + "genomics/homo_sapiens/riboseq_expression/Homo_sapiens.GRCh38.dna.chromosome.20.fa.gz", checkIfExists: true)
21+
]
22+
"""
23+
}
24+
}
25+
}
26+
27+
test("homo_sapiens [fasta + gtf] - chr20") {
28+
29+
when {
30+
process {
31+
"""
32+
input[0] = GUNZIP.out.gunzip.map{[
33+
[id:'homo_sapiens_chr20'],
34+
it[1],
35+
file(params.modules_testdata_base_path + "genomics/homo_sapiens/riboseq_expression/Homo_sapiens.GRCh38.111_chr20.gtf", checkIfExists: true)
36+
]}
37+
"""
38+
}
39+
}
40+
41+
then {
42+
assertAll(
43+
{ assert process.success },
44+
{ assert snapshot(
45+
process.out.index.collect { meta, idx -> [meta, file(idx).name] },
46+
process.out.findAll { key, val -> key.startsWith('versions') }
47+
).match() }
48+
)
49+
}
50+
}
51+
52+
test("homo_sapiens [fasta + gtf] - chr20 - stub") {
53+
54+
options '-stub'
55+
56+
when {
57+
process {
58+
"""
59+
input[0] = GUNZIP.out.gunzip.map{[
60+
[id:'homo_sapiens_chr20'],
61+
it[1],
62+
file(params.modules_testdata_base_path + "genomics/homo_sapiens/riboseq_expression/Homo_sapiens.GRCh38.111_chr20.gtf", checkIfExists: true)
63+
]}
64+
"""
65+
}
66+
}
67+
68+
then {
69+
assertAll(
70+
{ assert process.success },
71+
{ assert snapshot(process.out).match() }
72+
)
73+
}
74+
}
75+
}
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
{
2+
"homo_sapiens [fasta + gtf] - chr20 - stub": {
3+
"content": [
4+
{
5+
"0": [
6+
[
7+
{
8+
"id": "homo_sapiens_chr20"
9+
},
10+
[
11+
"homo_sapiens_chr20.oml:md5,d41d8cd98f00b204e9800998ecf8427e"
12+
]
13+
]
14+
],
15+
"1": [
16+
[
17+
"GEDI_INDEXGENOME",
18+
"gedi",
19+
"1.0.6a"
20+
]
21+
],
22+
"index": [
23+
[
24+
{
25+
"id": "homo_sapiens_chr20"
26+
},
27+
[
28+
"homo_sapiens_chr20.oml:md5,d41d8cd98f00b204e9800998ecf8427e"
29+
]
30+
]
31+
],
32+
"versions_gedi": [
33+
[
34+
"GEDI_INDEXGENOME",
35+
"gedi",
36+
"1.0.6a"
37+
]
38+
]
39+
}
40+
],
41+
"timestamp": "2026-05-19T10:43:30.416436473",
42+
"meta": {
43+
"nf-test": "0.9.5",
44+
"nextflow": "26.04.1"
45+
}
46+
},
47+
"homo_sapiens [fasta + gtf] - chr20": {
48+
"content": [
49+
[
50+
[
51+
{
52+
"id": "homo_sapiens_chr20"
53+
},
54+
"homo_sapiens_chr20"
55+
]
56+
],
57+
{
58+
"versions_gedi": [
59+
[
60+
"GEDI_INDEXGENOME",
61+
"gedi",
62+
"1.0.6a"
63+
]
64+
]
65+
}
66+
],
67+
"timestamp": "2026-05-19T11:58:10.928325624",
68+
"meta": {
69+
"nf-test": "0.9.5",
70+
"nextflow": "26.04.1"
71+
}
72+
}
73+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
channels:
2+
- bioconda
3+
- conda-forge
4+
dependencies:
5+
- bioconda::gedi=1.0.6a

modules/nf-core/gedi/price/main.nf

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
process GEDI_PRICE {
2+
tag "$meta.id"
3+
label 'process_medium'
4+
label 'process_long'
5+
6+
conda "${moduleDir}/environment.yml"
7+
container "${ workflow.containerEngine in ['singularity', 'apptainer'] && !task.ext.singularity_pull_docker_container ?
8+
'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/cd/cd008e5721759d5909909254c77ec449778e0fc7c669b7c926b68f0c9059f510/data' :
9+
'community.wave.seqera.io/library/gedi_price:2392624d5f803049' }"
10+
11+
input:
12+
tuple val(meta), path(bams, stageAs: 'bams/*'), path(bais, stageAs: 'bams/*')
13+
tuple val(meta2), path(index)
14+
15+
output:
16+
tuple val(meta), path("${prefix}.orfs.tsv") , emit: orfs_tsv
17+
tuple val(meta), path("${prefix}.orfs.cit") , emit: orfs_cit, optional: true
18+
tuple val(meta), path("${prefix}.orfs.cit.metadata.json") , emit: orfs_metadata, optional: true
19+
tuple val(meta), path("${prefix}.codons.cit") , emit: codons_cit, optional: true
20+
tuple val(meta), path("${prefix}.model") , emit: model, optional: true
21+
tuple val(meta), path("${prefix}.signal.tsv") , emit: signal, optional: true
22+
tuple val(meta), path("${prefix}.param") , emit: param, optional: true
23+
tuple val("${task.process}"), val('gedi'), eval("gedi -e Version 2>&1 | sed -n 's/.*Gedi version \\([^ ]*\\).*/\\1/p' | head -n 1"), topic: versions, emit: versions_gedi
24+
25+
when:
26+
task.ext.when == null || task.ext.when
27+
28+
script:
29+
def args = task.ext.args ?: ''
30+
prefix = task.ext.prefix ?: "${meta.id}"
31+
def oml = "${index}/${meta2.id ?: 'reference'}.oml"
32+
"""
33+
ls -1 bams/*.bam > price_input.bamlist
34+
bamlist2cit -n ${task.cpus} -p price_input.bamlist
35+
36+
gedi -e Price \\
37+
-reads price_input.bamlist.cit \\
38+
-genomic ${oml} \\
39+
-prefix ${prefix} \\
40+
-nthreads ${task.cpus} \\
41+
${args}
42+
"""
43+
44+
stub:
45+
prefix = task.ext.prefix ?: "${meta.id}"
46+
"""
47+
touch ${prefix}.orfs.tsv
48+
touch ${prefix}.orfs.cit
49+
touch ${prefix}.orfs.cit.metadata.json
50+
touch ${prefix}.codons.cit
51+
touch ${prefix}.model
52+
touch ${prefix}.signal.tsv
53+
touch ${prefix}.param
54+
"""
55+
}

0 commit comments

Comments
 (0)