Add gedi/price test data: chr19+chr22 Ribo-seq cohort#2061
Merged
Conversation
Four Ribo-seq samples downsampled to chr19+chr22 protein-coding-gene loci, with a protein-coding-only reference. Sized so every file is under 4 MiB and PRICE still produces a non-empty orfs.tsv (381 lines). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
53ed2bc to
0374c61
Compare
3 tasks
Adds an entry under genomics/homo_sapiens/riboseq_expression for the new price/ fixtures, matching the existing plastid/ and ribocode/ block style. Notes why a second chromosome (chr19+chr22) and 4-sample cohort are needed - PRICE's candidate-ORF count and noise-model floor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaced the reference to "Seqera Platform stage commit c4cb19dc" with the persistent SRA accession trail (SRR15480788/9/90/91 from GSE182201) plus the alignment tooling. The Platform workdir wouldn't be reachable to anyone reading this README later. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Nice! LGTM! |
luisas
approved these changes
May 19, 2026
pinin4fjords
added a commit
to pinin4fjords/nf-core-modules
that referenced
this pull request
May 19, 2026
nf-core/test-datasets#2061 merged; fixtures now live on the modules branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
delfiterradas
pushed a commit
to grst/modules
that referenced
this pull request
May 19, 2026
* feat(gedi): add gedi/indexgenome and gedi/price modules Adds two modules wrapping the GEDI / PRICE toolkit (`bioconda::gedi=1.0.6a`) for Ribo-seq translated-ORF discovery. PRICE (Erhard et al. 2018, doi:10.1038/nmeth.4631) calls translated ORFs from ribosome profiling data with near-cognate start codon detection. `gedi/indexgenome` wraps `gedi -e IndexGenome`, producing the `.oml` genome index directory consumed by PRICE. `gedi/price` wraps `bamlist2cit` + `gedi -e Price`, taking a cohort of Ribo-seq BAMs plus the genome index and emitting ORF predictions (`*.orfs.tsv` + `*.cit` + sidecars). One-shot across the cohort - PRICE is not per-sample. Both modules use Wave-built community containers from `bioconda::gedi=1.0.6a`. The bioconda recipe was merged 2026-05-16; using Wave directly for now. Source: nf-core/riboseq#174. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(gedi/indexgenome): use ${prefix} for the index output directory Previously hard-coded the output directory as `price_index`. Switching to `${prefix}` (default `${meta.id}`, overridable via `task.ext.prefix`) lets callers control the directory name and matches the nf-core convention for publishable directory outputs. The default ${meta.id} keeps the directory keyed to the reference id, so when `gedi/price` opens `${index}/${meta2.id}.oml`, the lookup still resolves provided meta ids match (already the case in the test chain). Snapshot regenerated: the index directory name in the output snapshot changes from `price_index` to the test's `homo_sapiens_chr20` (its meta.id). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gedi/price): add real test using minimised chr19+chr22 fixtures Replaces the stub-only PRICE test with an end-to-end test that runs PRICE on a minimal cohort of four Ribo-seq samples (chr19+chr22, protein-coding-only reference). The cohort produces 380 ORF calls; snapshot captures the orfs.tsv line count for stability validation. Fixtures published in nf-core/test-datasets PR nf-core#2061. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gedi/indexgenome): update meta.yml output name after ${prefix} refactor The earlier `${prefix}` refactor (commit 0ca4c45) changed the index output declaration from `path("price_index")` to `path("${prefix}")`, but the meta.yml output entry still hard-coded `price_index` — causing CI lint to flag `correct_meta_outputs: Module meta.yml does not match main.nf`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(gedi/indexgenome): collapse leftover alignment padding on index emit After the `${prefix}` refactor (commit 0ca4c45) the index output line was the only `tuple val(meta), path(...)` emit in the module, so the 52-space alignment padding it kept from when the path was `price_index` no longer aligns with anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gedi): correct licence (GPL-3.0) and Gedi description in meta.yml Two cross-cutting fixes from review of nf-core#11693: - Licence was Apache-2.0 in both meta.yml files; the upstream repo erhard-lab/gedi is GPL-3.0. Corrected. - "GEDI (Gene Expression Data Integration)" was unverified — the upstream README/wiki/paper don't expand the acronym that way. Replaced with the upstream one-liner phrasing. PRICE meta.yml also adds the verified PRICE expansion (Probabilistic Inference of Codon Activities by an EM algorithm) from the GEDI wiki. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gedi/price): point fixtures at nf-core/test-datasets@modules nf-core/test-datasets#2061 merged; fixtures now live on the modules branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
manascripts
pushed a commit
to manascripts/modules
that referenced
this pull request
May 21, 2026
* feat(gedi): add gedi/indexgenome and gedi/price modules Adds two modules wrapping the GEDI / PRICE toolkit (`bioconda::gedi=1.0.6a`) for Ribo-seq translated-ORF discovery. PRICE (Erhard et al. 2018, doi:10.1038/nmeth.4631) calls translated ORFs from ribosome profiling data with near-cognate start codon detection. `gedi/indexgenome` wraps `gedi -e IndexGenome`, producing the `.oml` genome index directory consumed by PRICE. `gedi/price` wraps `bamlist2cit` + `gedi -e Price`, taking a cohort of Ribo-seq BAMs plus the genome index and emitting ORF predictions (`*.orfs.tsv` + `*.cit` + sidecars). One-shot across the cohort - PRICE is not per-sample. Both modules use Wave-built community containers from `bioconda::gedi=1.0.6a`. The bioconda recipe was merged 2026-05-16; using Wave directly for now. Source: nf-core/riboseq#174. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(gedi/indexgenome): use ${prefix} for the index output directory Previously hard-coded the output directory as `price_index`. Switching to `${prefix}` (default `${meta.id}`, overridable via `task.ext.prefix`) lets callers control the directory name and matches the nf-core convention for publishable directory outputs. The default ${meta.id} keeps the directory keyed to the reference id, so when `gedi/price` opens `${index}/${meta2.id}.oml`, the lookup still resolves provided meta ids match (already the case in the test chain). Snapshot regenerated: the index directory name in the output snapshot changes from `price_index` to the test's `homo_sapiens_chr20` (its meta.id). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gedi/price): add real test using minimised chr19+chr22 fixtures Replaces the stub-only PRICE test with an end-to-end test that runs PRICE on a minimal cohort of four Ribo-seq samples (chr19+chr22, protein-coding-only reference). The cohort produces 380 ORF calls; snapshot captures the orfs.tsv line count for stability validation. Fixtures published in nf-core/test-datasets PR nf-core#2061. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gedi/indexgenome): update meta.yml output name after ${prefix} refactor The earlier `${prefix}` refactor (commit 0ca4c45) changed the index output declaration from `path("price_index")` to `path("${prefix}")`, but the meta.yml output entry still hard-coded `price_index` — causing CI lint to flag `correct_meta_outputs: Module meta.yml does not match main.nf`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(gedi/indexgenome): collapse leftover alignment padding on index emit After the `${prefix}` refactor (commit 0ca4c45) the index output line was the only `tuple val(meta), path(...)` emit in the module, so the 52-space alignment padding it kept from when the path was `price_index` no longer aligns with anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gedi): correct licence (GPL-3.0) and Gedi description in meta.yml Two cross-cutting fixes from review of nf-core#11693: - Licence was Apache-2.0 in both meta.yml files; the upstream repo erhard-lab/gedi is GPL-3.0. Corrected. - "GEDI (Gene Expression Data Integration)" was unverified — the upstream README/wiki/paper don't expand the acronym that way. Replaced with the upstream one-liner phrasing. PRICE meta.yml also adds the verified PRICE expansion (Probabilistic Inference of Codon Activities by an EM algorithm) from the GEDI wiki. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gedi/price): point fixtures at nf-core/test-datasets@modules nf-core/test-datasets#2061 merged; fixtures now live on the modules branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test fixture set for nf-core/modules#11693 (
gedi/price) underdata/genomics/homo_sapiens/riboseq_expression/price/.PRICE's Bayesian model needs a much higher candidate-ORF count than the existing chr20 fixtures provide for its expectation-maximisation to converge. This PR adds a minimised chr19+chr22 cohort that gives PRICE enough signal to call ORFs in CI.
PRICE output on this fixture set: 380 ORF rows (377-line
orfs.tsvincluding header and metadata).Files
Homo_sapiens.GRCh38_chr19_22.pc_exon_masked.fa.gzHomo_sapiens.GRCh38.111_chr19_22.pc.gtf.gzbams/SRX1178088{5,6,7,8}.chr19_22.ds50.bam.baiREADME.mdEach file is < 4 MB. Total: 10.88 MB across 11 files (down from 19.68 MB across 14 files in an earlier revision; see force-push history).
Why these sizes — empirical justification
These fixtures are at the floor of what makes PRICE produce non-empty output. Every dimension was minimised iteratively against a real
gedi -e Priceinvocation. Probes that didn't make it:NoiseModel.computeMeanSplinecrashes during inferenceIndex 0 out of bounds for length 0inPriceOrfInferenceNpositions in the masked FASTAThe empirical floor along each axis:
Ns) or PRICE crashes.Could we go smaller?
Three avenues considered and rejected:
Npositions.The 3.43 MB FASTA is the largest file in the set. It's chr19+chr22 protein-coding exons of GRCh38 primary assembly with everything outside those regions hard-masked; this is the most aggressive masking PRICE tolerates while still producing ORFs.
Source
Original BAMs:
SRX1178088{5,6,7,8}.Aligned.sortedByCoord.out.bamfrom a full-scale nf-core/riboseqtest_fullrun. Same upstream sample identifiers as the existing chr20 fixtures in this branch.Test plan
modules/nf-core/gedi/price/tests/main.nf.testin nf-core/modules#11693 consumes these fixtures viaraw.githubusercontent.comURLs pinned to this PR's branch; URLs to be updated tomodulesafter this PR mergesnf-core modules test --profile docker gedi/priceproduces 380-ORForfs.tsvwith two-pass snapshot stability