Riker is a fast Rust CLI toolkit that ports key QC metrics tools from Picard and fgbio. When porting from these tools, the goal is to match correctness while improving the interface and output format.
Prerequisites: Rust stable (see Cargo.toml for minimum version) and
cargo-nextest.
cargo build # debug build
cargo build --release # optimized buildRun all four before submitting changes:
cargo ci-fmt # check formatting
cargo ci-lint # clippy, pedantic, warnings-as-errors
cargo clippy --all-targets
cargo ci-test # all tests via nextestThe ci-* aliases are defined in .cargo/config.toml.
| Crate | Purpose |
|---|---|
riker |
Binary and library (published as riker_lib) |
riker_derive |
Proc-macro crate: #[derive(MetricDocs)] and #[multi_options] |
Each subcommand is a struct that implements the Command trait
(src/commands/command.rs) and is dispatched via the Subcommand enum in
src/main.rs.
Each command's core logic lives in a Collector struct implementing the
Collector trait (src/collector.rs). Collectors store all configuration
(output paths, reference, thresholds) as fields set at construction. The trait
methods only receive the BAM header and records — this design enables the multi
command to feed a single BAM pass to multiple collectors in parallel.
Per-command metric structs live alongside their collector (e.g.
InsertSizeMetric in src/commands/isize.rs). Metric structs derive
Serialize, Deserialize, and MetricDocs. Shared serialization utilities
live in src/metrics.rs.
Reusable option groups in src/commands/common.rs (InputBamOptions,
ReferenceOptions, OptionalReferenceOptions, IntervalOptions,
DuplicateOptions) are composed into commands via #[command(flatten)].
This section walks through every step needed to add a new metric collector to
riker, using isize and hybcap as reference examples.
Create src/commands/<name>.rs. The file should contain four things, in order:
File suffixes — constants for each output file:
pub const METRICS_SUFFIX: &str = ".<name>-metrics.txt";Command struct — the CLI entry point. Flatten shared option groups and your options struct:
#[derive(Args, Debug, Clone)]
#[command(long_about, after_long_help = "Examples:\n riker <name> -i in.bam -o out")]
pub struct MyCommand {
#[command(flatten)]
pub input: InputBamOptions,
#[arg(short = 'o', long, value_name = "PREFIX")]
pub output: PathBuf,
#[command(flatten)]
pub reference: ReferenceOptions, // if needed
#[command(flatten)]
pub options: MyOptions,
}Options struct — tool-specific tuning parameters. This struct serves double
duty: it configures the standalone command and (via the #[multi_options]
macro) generates the prefixed options for the multi command. See
The #[multi_options] macro below for details.
#[riker_derive::multi_options("<name>", "<Help Heading>")]
#[derive(Args, Debug, Clone)]
#[command(next_help_heading = "Tuning")]
pub struct MyOptions {
/// Include duplicate reads in metric calculations.
#[arg(long, default_value_t = false)]
pub include_duplicates: bool,
// ...
}Metric struct(s) — one struct per output file, deriving Serialize,
Deserialize, and MetricDocs:
/// Summary metrics for the <name> collector.
#[derive(Debug, Serialize, Deserialize, MetricDocs)]
pub struct MyMetric {
/// Description of this field (becomes the `riker docs` entry).
pub field_name: u64,
/// A fractional field with 5 decimal places.
#[serde(serialize_with = "serialize_f64_5dp")]
pub frac_something: f64,
}Collector struct — implements the Collector trait:
pub struct MyCollector { /* fields */ }
impl MyCollector {
#[must_use]
pub fn new(prefix: &Path, options: &MyOptions) -> Self { /* ... */ }
}
impl Collector for MyCollector {
fn initialize(&mut self, header: &Header) -> Result<()> { /* ... */ }
fn accept(&mut self, record: &RecordBuf, header: &Header) -> Result<()> { /* ... */ }
fn finish(&mut self) -> Result<()> { /* ... */ }
fn name(&self) -> &'static str { "<name>" }
}Command impl — the standalone execution path. Use for_each_record to
iterate records with buffer reuse (avoids per-record allocation for BAM/SAM):
impl Command for MyCommand {
fn execute(&self) -> Result<()> {
let (mut reader, header) = AlignmentReader::new(&self.input.input, ...)?;
let mut collector = MyCollector::new(&self.output, &self.options);
collector.initialize(&header)?;
let mut progress = ProgressLogger::new("<name>", "reads", 5_000_000);
reader.for_each_record(&header, |record| {
progress.record_with(record, &header);
collector.accept(record, &header)
})?;
progress.finish();
collector.finish()
}
}- Add
pub mod <name>;tosrc/commands/mod.rs - Add a variant to the
Subcommandenum insrc/main.rs:MyCmd(MyCommand),
- Add a match arm in
impl Command for Subcommand::execute():Subcommand::MyCmd(c) => c.execute(),
The multi command (src/commands/multi.rs) runs multiple collectors in a
single BAM pass. Every new collector should be wired in:
- Add a
CollectorKindvariant and update itsDisplayimpl - Import
MyCollectorandMultiMyOptions(generated by the macro) - Add a
#[command(flatten)]field to theMultistruct:#[command(flatten)] pub my_opts: MultiMyOptions,
- Add a match arm in
build_collectors():CollectorKind::MyColl => { let opts = self.my_opts.clone().validate()?; collectors.push(Box::new(MyCollector::new(&self.output, &opts))); }
- Decide the default — add the new kind to the
default_values_tlist onMulti::toolsif the tool should run by default, or leave it opt-in (like hybcap) if it requires extra inputs.
Add your metric structs to src/commands/docs.rs — add
render_metric_docs_text::<MyMetric> and render_metric_docs_markdown::<MyMetric>
calls in both the "text" and "markdown" arms.
- Integration tests in
tests/test_<name>.rs— build BAM data programmatically usingSamBuilder, run the command, deserialize output withread_metrics_tsv, and assert on results. - Multi integration tests in
tests/test_multi.rs— add at least one test that runs your collector through themulticommand. - Unit tests in an inline
#[cfg(test)]module for any non-trivial internal logic.
Run the full verification checklist.
The #[multi_options("prefix", "Heading")] attribute macro
(in riker_derive) generates a Multi<StructName> companion struct for use
in the multi command. It prefixes every CLI flag with prefix:: (e.g.
--isize::include-duplicates) and groups them under the given help heading.
The macro classifies each field into one of three categories:
| Category | Original type | Has default_value_t? |
Multi struct type | In validate() |
|---|---|---|---|---|
| Optional | Option<T> |
n/a | Option<T> |
Passed through |
| Defaulted | T |
Yes | T (with default) |
Passed through |
| Required | T |
No | Option<T> |
.ok_or_else(...) with error message |
What it generates:
Multi<Name>struct withclap::Args, all fields prefixed.validate(self) -> anyhow::Result<Name>— convertsMulti<Name>back to the original struct, checking that required fields areSome. Required fields get an error message like"--hybcap::baits is required when hybcap is selected".From<Name> for Multi<Name>— wraps required fields inSome(), passes others through. Useful for constructing multi options from standalone options in tests.
Rules for Options structs:
- The struct must have
#[derive(Args, Debug, Clone)]and implementDefault. - All fields must be flat — no
#[command(flatten)]. - Every non-
Optionfield that isn't inherently required at the CLI level should have#[arg(long, default_value_t = ...)]. - Fields that are required (like file paths) should be plain types without
default_value_t. The macro wraps them inOptionfor the multi struct and generates validation. TheDefaultimpl can use a placeholder (e.g.PathBuf::new()) since standalone clap validation ensures it's always provided, andvalidate()checks it in the multi path. - Doc comments on fields become the
--helptext; for required fields the macro appends" Required when <prefix> is selected."
- TSV output: lowercase
snake_caseheaders, tab-separated, no metadata or comment lines - Fractions: use
frac_prefix (notpct_) - Float precision: use the serializers in
src/metrics.rs(serialize_f64_2dp,serialize_f64_5dp,serialize_f64_6dp) - Scope: all reads in file combined — no per-read-group, library, or sample breakdown
- PF filtering: statistics are computed on PF reads by default; field names
do not carry a
pf_prefix
Priorities: correctness > readability > performance.
- Write idiomatic Rust — don't fight the language
- Prefer meaningful names even if longer (
aligned_base_countovercnt) - Extract small/medium functions with clear inputs and outputs
- Keep related code together (metric struct next to its collector)
- Use
pub(crate)for shared internal utilities - Doc comments on all public items; code comments explain why, not what
- Don't over-generalize — solve the problem at hand
No checked-in test data. All BAM, FASTA, and interval data is built
programmatically using the helpers in tests/helpers/mod.rs:
| Helper | Purpose |
|---|---|
SamBuilder |
Build RecordBuf instances and write to a temp BAM |
FastaBuilder |
Build a temp FASTA with .fai index |
read_metrics_tsv::<T>(path) |
Deserialize TSV output for assertions |
assert_float_eq!(a, b, eps) |
Float comparison with epsilon |
Guidelines:
- Integration tests go in
tests/test_<command>.rs; unit tests in inline#[cfg(test)]modules - Prefer many small, focused tests over parameterized/table-driven tests
- Test function, not implementation — tests should survive a significant refactor
- Cover expected results, error conditions, and boundary cases
Good bug reports are the fastest path to a fix. Please include:
- Version: output of
riker --version - Command: the exact command line that failed
- Error output: full stderr/stdout
- OS: operating system and version
- Expected vs. actual behavior: what should happen vs. what did happen
Full-size BAM files are often tens of gigabytes and can't be shared in an issue. Instead, create a small file that still reproduces the problem:
Extract a genomic region (preferred — keeps all reads in a specific area):
samtools view -h -b input.bam chr1:1000-2000 > minimal.bam
samtools index minimal.bamDownsample (when the bug isn't region-specific):
# Keeps ~0.1% of reads; seed=42 for reproducibility; preserves read pairs
samtools view -h -b -s 42.001 input.bam > minimal.bam
samtools index minimal.bamFor reference FASTA: extract just the relevant contigs, or describe the genome build (e.g. GRCh38) and source so we can obtain it.
For BED/interval files: trim to only the intervals that trigger the bug.
Before submitting, verify the bug still reproduces on your minimal files:
riker <your-command> --input minimal.bam [other args]Common gotcha: chromosome naming mismatches (chr1 vs 1) between your
BAM, reference, and interval files are a frequent source of errors — note which
style your files use.
- Get it correct and well-tested first, then optimize
- Profile before tuning — use samply or cargo-flamegraph to identify actual hot paths
- Verify correctness after any optimization (diff outputs against a baseline and run the full test suite)