You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: decode Gsj-region SA hits into split splice seeds
Pass-1 alignment was silently dropping SA hits that fall in the appended
splice-junction flanking buffer (Gsj). Those hits encode candidate
splice events for annotated junctions: when a read seed straddles the
donor/acceptor boundary of a Gsj slot, the matching genome bytes live
in two non-adjacent real-genome positions. STAR decodes these via
g1 >= P.nGenomeReal and feeds the (donor, acceptor) pair into the seed
pipeline; rustar-aligner was discarding them at position_to_chr (which
returns None for positions past the last real chromosome).
Plumb n_genome_real onto Genome (pinned at the pre-sjdb forward total)
and reload prepared junctions + sjdbOverhang from sjdbInfo.txt on the
index-load path. cluster_seeds now expands each SA hit through
decode_gsj_hit: real-genome hits pass through unchanged, single-flank
Gsj hits are skipped (the equivalent real-genome SA entry already
exists in the same range), and boundary-crossing Gsj hits split into
two virtual seeds with the donor-side read run and the acceptor-side
read run pointing at their respective real-genome positions. The
existing stitch DP then chains them via its splice branch.
Verified on the yeast SRR6357072 reproducer from the issue:
Number of splices: Total jumps from 366 to 631 (target ~720) and
GT/AG from 266 to 531 with non-canonical unchanged at 93. The
remaining gap is gated on PR #45's annotated-junction lookup landing.
Fixes#47
Co-Authored-By: Claude <noreply@anthropic.com>
0 commit comments