Skip to content

sjdb_score bonus is added to motif score instead of replacing it on annotated junctions #50

@pinin4fjords

Description

@pinin4fjords

Summary

When stitching an annotated splice junction, rustar adds the motif score AND the sjdb_score bonus to the alignment score. STAR uses one OR the other (replacement, not addition) — sjdb_score for annotated junctions, motif_score for unannotated ones. Net effect: rustar over-credits annotated junctions by motif_score (e.g. +0 for GT/AG → no visible difference, -4 for GC/AG → wrong by 4, -8 for AT/AC or non-canonical → wrong by 8).

Currently invisible because of #27 (annotated lookup always returns false so the additive branch never fires). The moment #27 lands (currently in PR #45), this becomes the dominant remaining AS-divergence source on annotated splices.

Surfaced during the AS-divergence investigation on #33.

Current rustar code

src/align/stitch.rs:1316-1319:

d_score += motif_score;
if is_annotated {
    d_score += scorer.sjdb_score;
}

STAR reference behaviour

source/Transcript_alignScore.cpp (and the matching logic in Stitch.cpp) treats annotated and unannotated junctions as mutually exclusive scoring paths:

if (sjAnnot == 1) {
    score += sjdbScore;   // annotated: replace motif score with sjdb bonus
} else {
    score += motifScore;  // unannotated: motif score only
}

So an annotated GT/AG junction in STAR contributes +sjdbScore (default +2), not motif_score (+0) + sjdbScore (+2) = +2. For motifs where motif_score != 0 this matters:

Motif motif_score (STAR default) rustar additive STAR replacement Per-junction error
GT/AG (canonical) 0 +0 + +2 = +2 +2 0 (invisible)
GC/AG -4 -4 + +2 = -2 +2 4
AT/AC -8 -8 + +2 = -6 +2 8
Non-canonical -8 -8 + +2 = -6 +2 8
(unannotated GT/AG) 0 +0 +0 0

Most annotated junctions in well-curated annotations are canonical GT/AG, so the visible impact is small on typical data. But for any genome with appreciable GC/AG, AT/AC, or rescue-by-novel junctions in the sjdb, rustar's AS is systematically off.

Suggested fix

Switch to replacement semantics matching STAR:

d_score += if is_annotated {
    scorer.sjdb_score
} else {
    motif_score
};

Or equivalently:

if is_annotated {
    d_score += scorer.sjdb_score;
} else {
    d_score += motif_score;
}

4-line patch. Gated on #27 — until that lands, is_annotated is always false and the change has no observable effect (the else branch is identical to today's unconditional d_score += motif_score).

Test plan

  1. Build with Log.final.out always reports Number of splices: Annotated (sjdb) = 0 despite --sjdbGTFfile #27 + this PR applied.
  2. Run on a paired-end input with a known mix of annotated motif types (or use the yeast test profile — most splices are canonical GT/AG, so the per-record delta on this profile is small).
  3. Compare AS values against STAR's on identical-CIGAR records. On annotated GT/AG records the delta should be 0; on annotated GC/AG records the delta should drop from -4 (pre-fix) to 0.

Severity

Medium-low. Score-only divergence on annotated junctions, observable only after #27 lands. Doesn't affect alignment selection (since the relative ordering of candidates is preserved — both annotated branches gain +motif_score - sjdb_score relative to STAR, so the ranking among annotated alternatives is consistent). Affects any downstream tool that consumes the absolute AS value (multi-mapper EM weights, score-threshold filters keyed on STAR's AS distribution).


Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise: author:pinin4fjords or grep for nf-core/rnaseq#1855.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions