Code organization and style

Great job @Psy-Fer!

Here’s my review of the code base and what I think we could do:

- Code reuse prevents drift/partial fixes and improves maintainability: some `if`expressions have almost the same code on both sides, some blocks are near-verbatim duplicates. We can use loops or helper functions to centralize some of that code. E.g. I added this: https://github.com/scverse/rustar-aligner/blob/199290d83904940a7316be7468b858bc8d39bd48/src/align/transcript.rs#L61-L65 and used it in a little helper I added for a pattern I saw in `finalize_transcript`: https://github.com/scverse/rustar-aligner/blob/199290d83904940a7316be7468b858bc8d39bd48/src/align/stitch.rs#L1756-L1764

- Clear use of numeric types prevents overflow bugs like https://github.com/scverse/rustar-aligner/blob/199290d83904940a7316be7468b858bc8d39bd48/docs-old/dev/BUGFIX_2026-02-09.md#L21-L26

  We currently use `u32`, `i32`, `u64`, `i64`, and `usize` with no clear guidence of why we use what where. Would be amazing if we could eventually enable all the clippy lints for unsafe casts after cleaning all that up, e.g. changing `if some_i32 > 0 { thing = some_i32 as u32 }` to `thing = u32::try_from(some_i32)...` or so

- Rust has a lot of ways to abstract things (e.g. the extension traits from the first point), and I thing we should use them, e.g. code like this encapsulating a single task should be part of a function or trait instead of just appearing in the business logic: https://github.com/scverse/rustar-aligner/blob/199290d83904940a7316be7468b858bc8d39bd48/src/align/read_align.rs#L353-L366

  Maybe using `itertools` could help in some cases? IDK if e.g. using its hash-based `uniq` here would be more efficient or less than our very ad-hoc looking `sort`-then-`dedup`.

- I personally really like splitting up huge multi-step functions like the above-mentioned `finalize_transcript` into individual steps that are clear in what they receive and what they emit. Every function that has `// 1. do first step: …` could instead call another function for this step. custom structs can help holding common context for this.

	fn append_match(ops: &mut Vec<Op>, len: usize) {
	if let Some(op) = ops.last_mut()
	&& op.kind() == Kind::Match
	{
	*op = op.add_len(len);
	} else {
	ops.push(Op::new(Kind::Match, len));
	}
	}

	## Bug #1: Integer Overflow from Unsafe Casts

	### Problem
	- Location: `src/align/stitch.rs`, lines 423-424
	- Symptom: CIGAR strings with massive values near 2³² (e.g., `10M4294953882D12M`)
	- Root Cause: Negative gap values cast directly to `u32` without validation

	// Deduplicate transcripts with identical genomic coordinates AND CIGAR.
	transcripts.sort_by(\|a, b\| {
	(a.chr_idx, a.genome_start, a.genome_end, a.is_reverse)
	.cmp(&(b.chr_idx, b.genome_start, b.genome_end, b.is_reverse))
	.then_with(\|\| cmp_cigar(&a.cigar, &b.cigar))
	.then_with(\|\| b.score.cmp(&a.score))
	});
	transcripts.dedup_by(\|a, b\| {
	a.chr_idx == b.chr_idx
	&& a.genome_start == b.genome_start
	&& a.genome_end == b.genome_end
	&& a.is_reverse == b.is_reverse
	&& a.cigar == b.cigar
	});

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code organization and style #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	impl CigarOpExt for cigar::Op {
	fn add_len(&self, len: usize) -> Self {
	cigar::Op::new(self.kind(), self.len() + len)
	}
	}

Code organization and style #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions