feat: add --outSAMattrRGline with @RG header and RG:Z tags#6
Merged
Conversation
Parses STAR's comma-separated multi-RG syntax (first field must be ID: on each block, `,` separates blocks, tokens are tab-joined into the SAM @rg body). Replicates a single RG line across N input files like STAR does, and errors on count mismatch or RG-in-outSAMattributes without a line. Auto-appends RG to sam_attribute_set() when a line is set (matches Parameters_samAttributes.cpp:201). SAM/BAM headers now emit one @rg per unique ID, and every record (mapped, unmapped, half-mapped, both- mates) carries RG:Z:<id> when attrs contain RG. Unblocks nf-core/rnaseq, which invokes STAR with `--outSAMattrRGline 'ID:X' 'SM:X'` and previously errored on the unknown flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove the `let _ = params.rg_ids()?;` early-validation line in
`align_reads_paired_end`. `validate()` already calls `rg_ids()` at
startup, so the second call was a no-op.
- Simplify `maybe_insert_rg_tag` to `(record, rg_id)` by dropping the
`attrs: &HashSet<String>` parameter and its `attrs.contains("RG")`
gate. `sam_attribute_set()` auto-inserts "RG" whenever an RG line is
configured, so `rg_id.is_some()` already implies the tag is wanted.
- Replace the inline `if let Some(id) = rg_id { ... }` block in
`build_unmapped_record` with a call to the simplified helper, so every
writer-side RG insertion goes through the same path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `Parameters::primary_rg_id() -> Result<Option<String>, Error>` that returns the first RG line's `ID:` value directly, and use it to replace the `let rg_ids = params.rg_ids()?; let rg_id = rg_ids.first()...` two-step pattern at every SAM/BAM writer callsite (5 in sam.rs, 1 in lib.rs). `rg_ids()` is retained because `validate()` still uses it for the 1-vs-N count check against `--readFilesIn`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the manual `while i < len` index-walk with a `out_sam_attr_rg_line.split(|tok| tok == ",")` that maps each block to its tab-joined body and propagates any `ID:`-prefix error via `collect::<Result<_,_>>()`. Same public signature and return value as before, but with half the local state. Behavior note: an empty block (adjacent `,`s or a trailing `,`) now returns a `Parameter` error instead of being silently skipped. This matches STAR's `Parameters_readFilesInit.cpp` which requires every RG block to begin with `ID:`. Existing tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`SamWriter::write_unmapped` had no production callers — the SE/PE write paths build unmapped records via `build_unmapped_record` and batch them through `write_batch`. The only caller was the method's own self-test. Delete both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes formatting drift inherited from main. No semantic changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
--outSAMattrRGline with @RG header and RG:Z tags
7 tasks
Psy-Fer
added a commit
that referenced
this pull request
Apr 24, 2026
- shuffle_tied_prefix: deterministic per-read RNG for tied primary selection - --runRNGseed param (default 777, matches STAR) - --outSAMattrRGline: @rg header + RG:Z tags - Various code simplifications and formatting
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the
--outSAMattrRGlineCLI flag, emits@RGlines in the SAM header, and tags every record withRG:Z:<ID>. Matches STAR's semantics (Parameters_readFilesInit.cpp:65-93,samHeaders.cpp,ReadAlign_alignBAM.cpp:333).Part 2 of 3 branches unblocking nf-core/rnaseq's STAR_ALIGN step. Standalone PR; does not depend on the others.
Changes
src/params.rs:out_sam_attr_rg_line: Vec<String>arg +parsed_rg_lines()/rg_ids()/rg_line_set()helpers. STAR's comma-separated multi-RG syntax supported. Auto-appendsRGtosam_attribute_set()when a line is set;validate()errors onRGin attrs without a line (matches STAR).src/io/sam.rs:build_sam_header()now emits@RGlines (deduplicated by ID). Newmaybe_insert_rg_tag()helper insertsRG:Z:<id>on every record (main, mate2, half-mapped, unmapped).src/io/bam.rs: call-site updates for the new signatures.src/lib.rs: computes RG ID once per run; passes through SE and PE paths.Design note
The RG tag is inserted post-build via
maybe_insert_rg_tag()rather than threaded through every builder signature. Same output, less churn.Follow-up commits
attrsgate onmaybe_insert_rg_tag; addParameters::primary_rg_idhelper to replace a recurring allocation pattern; rewriteparsed_rg_lines()usingVec::split; remove deadSamWriter::write_unmappedmethod.cargo fmtcommit to fix pre-existing drift inherited frommain.Test plan
cargo test— 286 unit + 4 integration (−1 from removing the self-test of the deleted dead code)cargo clippy --lib -- -D warnings— cleancargo fmt --check— clean@RG+RG:Z:viasamtools view -Handsamtools view | grep RG:Z:)Notes
parsed_rg_lines()now errors on "empty block after trailing comma" instead of silently skipping. Stricter, matches STAR.🤖 Generated with Claude Code