-**396 tests passing, 0 clippy warnings.** SE: 8613/8926 compare_sam.py (96.5%; note: lower due to seeded-RNG tie-break PR diverging from STAR's mt19937), **99.815% faithfulness (tie-adjusted)** (8611/8627 non-tie reads exact), 299 tie-breaking diffs excluded. 1 CIGAR-only disagree (ERR12389696.13573895, insertion placement, seed-level tie). **0 STAR-only / 0 rustar-aligner-only SE reads**. PE: **8390 both-mapped** (STAR: 8390), **0 half-mapped**, 0 MAPQ inflations / 0 deflations, **99.883% PE exact faithfulness (tie-adjusted)** (16284/16306, 475 tie-breaking diffs excluded), **0 proper-pair diffs**, **0 NH diffs**. Phase 17.A: `scoreSeedBest` pre-extension. Phase 17.B: per-mate seeding. Phase 17.C: STAR-faithful SCORE-GATE + mappedFilter. Phase 17.D: combined-span penalty fix + dedup ordering. Phase 17.8: `--quantMode GeneCounts`. Phase E fix (2026-04-21): mate_id-aware diagonal dedup. Phase E2 (2026-04-22): STAR-faithful combined-read seeding. Phase E3 (2026-04-22): combined-threshold half-mapped fallback. Phase E4 (2026-04-22): PE-CHECK2 unconditional. Phase E5 (2026-04-23): split_combined_wt n_mismatch propagation. Phase E6 (2026-04-24): tie-adjusted faithfulness metric in assess_faithfulness.py. Phase F1: --runRNGseed + seeded primary tie-break (PR #5). Phase F2: --outSAMattrRGline (PR #6). Phase F3: --quantMode TranscriptomeSAM (PR #7). Phase F4: SJDB insertion into Genome+SA at genomeGenerate (PR #8). Phase G1 (2026-04-29): junction_shifts fix in split_combined_wt (rDNA cross-copy false-splice filter). Phase G2 (2026-04-29): MAX_RECURSION 10k→100k + sa_pos_to_forward overflow fix (ERR12389696.7118031 NH=3→9). Phase 17.2 (2026-04-29): coordinate-sorted BAM output (`--outSAMtype BAM SortedByCoordinate` → `Aligned.sortedByCoord.out.bam`). Phase 17.4 (2026-04-29): `--outReadsUnmapped Fastx` → `Unmapped.out.mate1` / `Unmapped.out.mate2`; writes unmapped + TooManyLoci reads; PE writes both mates for fully-unmapped and half-mapped pairs. Phase 17.6 (2026-05-01): `--outStd SAM/BAM_Unsorted/BAM_SortedByCoordinate` — routes primary alignment output to stdout via `Box<dyn AlignmentWriter>` trait dispatch; `SamStdoutWriter`, `BamStdoutWriter`, `SortedBamStdoutWriter` in sam.rs/bam.rs; verified with samtools pipe (967 records). Phase G3 (2026-05-01): SA tie-breaking fix — `compare_suffixes` tie-breaker changed from `pos_b.cmp(&pos_a)` to `packed_a.cmp(&packed_b)` (ascending by packed SA value with strand bit); rustar-aligner SA is now **byte-for-byte identical** to STAR's SA for the yeast genome (10,862 → 0 entry diffs). diff AS: 6→4 cases (4 remaining are rustar-aligner improvements: .844151 VIII 0mm vs STAR VII 6mm, .4972950 spliced vs unspliced mate2). Phase 17.3 (2026-05-01): PE chimeric detection — `detect_inter_mate_chimeric` in `chimeric/detect.rs`; intra-mate multi-cluster chimeric via cluster splitting + mate2 read_pos adjustment; inter-mate chimeric for discordant pairs (diff chr, same strand, or >1Mb); `align_paired_read` returns 4-tuple including `Vec<ChimericAlignment>`; no benchmark regression (8390 both-mapped, 0 half-mapped). Phase 17.11 (2026-05-01): `--chimOutType WithinBAM` — chimeric alignments written as supplementary records (FLAG 0x800) in primary BAM; donor record has full SEQ + SA tag; acceptor has FLAG 0x800 + SA tag + empty SEQ; `build_within_bam_records` in `chimeric/output.rs`; `chim_out_junctions()` / `chim_out_within_bam()` helpers in params.rs; supports mixed `--chimOutType Junctions WithinBAM`. Phase 17.7 (2026-05-01): GTF tag parameters — `--sjdbGTFchrPrefix`, `--sjdbGTFfeatureExon`, `--sjdbGTFtagExonParentTranscript`, `--sjdbGTFtagExonParentGene`; `_configured` variants in `junction/gtf.rs`, `quant/mod.rs`, `quant/transcriptome.rs`, `junction/mod.rs`; all 4 production paths thread params; backward-compat wrappers preserve zero test disruption. Phase 17.9 (2026-05-01): `--outBAMcompression` (BGZF level -1–9, default 1; -1/0=NONE, 1-8=flate2 levels, ≥9=BEST) + `--limitBAMsortRAM` (bytes, 0=unlimited; aborts sort if ~400 bytes/record estimate exceeds limit); `bgzf_compression()` + `make_bgzf_writer()` helpers in `io/bam.rs`; threaded through all 4 BAM writers (unsorted file, sorted file, unsorted stdout, sorted stdout). PE chimericDetectionOld (2026-05-01): per-mate `detect_chimeric_old` called on `all_m1_transcripts` / `all_m2_transcripts` pools after `filter_paired_transcripts` in `read_align.rs`. Phase 17.12 (2026-05-01): BySJout disk buffering — `BySJReadMeta` struct + `NamedTempFile` SAM temp file replaces `Vec<AlignmentBatchResults>`; `create_bysj_writer` / `bysj_write_records` / `bysj_read_n_records` helpers in `io/sam.rs`; `tempfile` moved to `[dependencies]`. Phase 17.13 (2026-05-01): 8 integration tests in `tests/alignment_features.rs` — synthetic 20kb genome with planted GT-AG intron; tests cover BAM output, PE alignment, spliced reads, BySJout, GeneCounts, unmapped output, two-pass mode. Phase 12.2 (2026-05-04): SE chimeric Tier 1b soft-clip re-mapping — `detect_from_soft_clips` in `chimeric/detect.rs` re-seeds the primary alignment's soft-clipped bases when `detect_chimeric_old` finds no partner; `adjust_read_positions` helper shifts sub-seq coords into full-read space for right clips; called as Step 3c in `read_align.rs`. Phase 17.10 (2026-05-04): Chimeric Tier 3 — `detect_from_chimeric_residuals` in `chimeric/detect.rs` re-seeds outer uncovered read regions (before donor / after acceptor) of each found chimeric pair; enables 3-way gene-fusion detection; called as Step 3d in `read_align.rs`. See [ROADMAP.md](ROADMAP.md) for detailed phase tracking and [docs/](docs/) for per-phase notes.
0 commit comments