Skip to content

Commit 1de1952

Browse files
committed
feat(diann): add --strip_unknown_mods for hydroxyproline / non-standard PTMs
Some declared variable modifications are not recognised by the DIA-NN deep-learning predictor — most notably Oxidation on proline (hydroxyproline), ubiquitous in collagen/ECM. During in-silico library generation DIA-NN silently SKIPS those precursors (log: 'skipping N precursors, unrecognised modifications'), so they never enter the library and are never identified. Add a first-class boolean param 'strip_unknown_mods' (default false) that prepends --strip-unknown-mods to all DIA-NN steps via conf/modules/dia.config, forcing DIA-NN to predict spectra/RTs/IMs for those peptidoforms so they are retained and searchable. No-op when every declared modification is recognised. - nextflow.config + nextflow_schema.json: new strip_unknown_mods param. - conf/modules/dia.config: prepend --strip-unknown-mods when enabled. - docs/usage.md: GUI-flag mapping row + Common pitfalls note (incl. the Trypsin/P enzyme for --cut K*,R*, declared in the SDRF, for collagen). - CHANGELOG.
1 parent 71d1bac commit 1de1952

5 files changed

Lines changed: 17 additions & 5 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
### `Added`
99

10+
- New `--strip_unknown_mods` parameter (default `false`) that adds `--strip-unknown-mods` to the DIA-NN steps. Some declared variable modifications are not built into the DIA-NN deep-learning predictor — most notably **Oxidation on proline (hydroxyproline)**, ubiquitous in collagen/ECM samples. During in-silico library generation DIA-NN **silently skips** those precursors (logged as `skipping N precursors, unrecognised modifications`), so they never enter the library and are never identified. Enabling `--strip_unknown_mods` forces DIA-NN to predict spectra/RTs/IMs for them so they are retained and searchable; it is a no-op when every declared modification is already recognised. Documented under [Where each GUI flag goes](docs/usage.md) and [Common pitfalls](docs/usage.md).
1011
- DIA-NN **2.5.1** (academic) version profile `-profile diann_v2_5_1` (container `ghcr.io/bigbio/diann:2.5.1`).
1112
- **DIA-NN Enterprise (2.5.1) support** via `-profile diann_v2_5_1_enterprise` (container `ghcr.io/bigbio/diann-enterprise:2.5.1`). New `--enable_kb` flag adds the Enterprise Knowledge Base (`--kb`) to the first-pass search to boost identifications (mainly human data); it is gated to the Enterprise build and **on by default** under the `diann_v2_5_1_enterprise` profile (disable with `--enable_kb false`). New `--diann_license <file>` stages the Enterprise license key into each DIA-NN step as `--license`, with fallback to a key bundled next to the binary when unset. The license key is a per-user secret and is never committed or bundled into the shared image.
1213

conf/modules/dia.config

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,22 +11,22 @@
1111
process {
1212

1313
withName: ".*:DIA:INSILICO_LIBRARY_GENERATION" {
14-
ext.args = { params.extra_args ?: '' }
14+
ext.args = { (params.strip_unknown_mods ? '--strip-unknown-mods ' : '') + (params.extra_args ?: '') }
1515
}
1616

1717
withName: ".*:DIA:PRELIMINARY_ANALYSIS" {
18-
ext.args = { params.extra_args ?: '' }
18+
ext.args = { (params.strip_unknown_mods ? '--strip-unknown-mods ' : '') + (params.extra_args ?: '') }
1919
}
2020

2121
withName: ".*:DIA:ASSEMBLE_EMPIRICAL_LIBRARY" {
22-
ext.args = { params.extra_args ?: '' }
22+
ext.args = { (params.strip_unknown_mods ? '--strip-unknown-mods ' : '') + (params.extra_args ?: '') }
2323
}
2424

2525
withName: ".*:DIA:INDIVIDUAL_ANALYSIS" {
26-
ext.args = { params.extra_args ?: '' }
26+
ext.args = { (params.strip_unknown_mods ? '--strip-unknown-mods ' : '') + (params.extra_args ?: '') }
2727
}
2828

2929
withName: ".*:DIA:FINAL_QUANTIFICATION" {
30-
ext.args = { params.extra_args ?: '' }
30+
ext.args = { (params.strip_unknown_mods ? '--strip-unknown-mods ' : '') + (params.extra_args ?: '') }
3131
}
3232
}

docs/usage.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -397,6 +397,8 @@ If you have an existing DIA-NN GUI (workstation) run and want to reproduce its r
397397
| `--no-ifs-removal` | Set automatically for DIA-NN < 2.3 (removed upstream in 2.3+). |
398398
| `--qvalue 0.01` | DIA-NN default; `protein_level_fdr_cutoff: 0.01` controls pmultiqc filtering. |
399399
| `--matrices`, `--out`, `--out-lib`, `--gen-spec-lib`, `--lib`, `--threads`, `--verbose`, `--temp`, `--f`, `--use-quant` | Managed by the pipeline. Do not pass them. |
400+
| `--strip-unknown-mods` (predict modifications the DL predictor does not recognise) | Set `strip_unknown_mods: true`. **Required for variable modifications not built into the DIA-NN predictor — e.g. Oxidation on proline (hydroxyproline) for collagen/ECM samples.** Without it, in-silico library generation silently _skips_ those precursors (log: `skipping N precursors, unrecognised modifications`), so they never enter the library and are never identified. |
401+
| `--cut K*,R*` (cleave before proline — e.g. collagen/ECM) | Declare the enzyme in SDRF `comment[cleavage agent details]` = `NT=Trypsin/P;AC=MS:1001313` (→ `--cut K*,R*`). Standard `Trypsin` (`MS:1001251`) → `--cut K*,R*,!*P` (no cut before proline). |
400402

401403
### Worked example
402404

@@ -440,6 +442,7 @@ diann_extra_args: "--smart-profiling --peak-center"
440442
- **Passing `--reanalyse` via `--extra_args`.** It will be stripped or it will collide with the pipeline's empirical-library two-pass. Leave it out.
441443
- **Setting Carbamidomethyl(C) via parameters.** Modifications come from the SDRF, not from `params.yml`. If your GUI run had `--unimod4`, make sure the SDRF declares Carbamidomethyl(C) as fixed.
442444
- **Different DIA-NN version.** A pipeline run with `-profile diann_v2_3_2` will not match a 1.8.1 GUI run even with identical flags. Pin the same version in both places when comparing.
445+
- **Hydroxyproline / non-standard PTMs silently lost.** If a declared variable modification is not recognised by the DIA-NN deep-learning predictor (e.g. Oxidation on proline, common in collagen/ECM), in-silico library generation **skips those precursors** unless you set `strip_unknown_mods: true`. Check the `INSILICO_LIBRARY_GENERATION` log for `skipping N precursors, unrecognised modifications` — a non-zero `N` means those peptidoforms never entered the library. Also declare the modification in the SDRF (e.g. `NT=Oxidation;MT=Variable;TA=P;AC=UNIMOD:35`) and, for collagen, set the enzyme to `Trypsin/P` so trypsin cleaves before proline.
443446

444447
## Passing Extra Arguments to DIA-NN
445448

nextflow.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ params {
6868
debug_level = 3
6969
speclib = null
7070
extra_args = null
71+
strip_unknown_mods = false // add '--strip-unknown-mods': force DIA-NN to predict spectra/RTs/IMs for declared modifications its deep-learning predictor does not recognise (e.g. Oxidation on proline / hydroxyproline). Without it those precursors are silently skipped from the in-silico library and never identified. No-op when all declared modifications are recognised.
7172
scoring_mode = 'generic' // Scoring mode: 'generic' (default), 'proteoforms' (variant detection, >= 2.0), 'peptidoforms' (PTM analysis)
7273
aa_eq = false // add '--aa-eq': treat I&L, Q&E, N&D as equivalent during reannotation (essential for entrapment FDR benchmarks)
7374
dda = false // Fallback: explicitly enable DDA when SDRF lacks acquisition method (requires DIA-NN >= 2.3.2)

nextflow_schema.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -536,6 +536,13 @@
536536
"fa_icon": "fas fa-filter",
537537
"hidden": true
538538
},
539+
"strip_unknown_mods": {
540+
"type": "boolean",
541+
"default": false,
542+
"description": "Add --strip-unknown-mods so DIA-NN predicts spectra/RTs/IMs for declared modifications its predictor does not recognise (e.g. hydroxyproline). Without it those precursors are skipped from the in-silico library.",
543+
"fa_icon": "fas fa-prescription-bottle",
544+
"help_text": "Some declared variable modifications are not built into the DIA-NN deep-learning predictor (e.g. Oxidation on proline, common in collagen/ECM). During in-silico library generation DIA-NN silently skips those precursors (logged as 'skipping N precursors, unrecognised modifications') unless this is enabled, so they never enter the library and are never identified. No-op when all declared modifications are recognised by the predictor."
545+
},
539546
"extra_args": {
540547
"type": "string",
541548
"description": "Extra arguments appended to all DIA-NN steps. Flags incompatible with specific steps are automatically stripped with a warning.",

0 commit comments

Comments
 (0)