Skip to content

Indexed regionset refactor#245

Open
nsheff wants to merge 7 commits intodevfrom
indexed-regionset-refactor
Open

Indexed regionset refactor#245
nsheff wants to merge 7 commits intodevfrom
indexed-regionset-refactor

Conversation

@nsheff
Copy link
Member

@nsheff nsheff commented Mar 11, 2026

Resolves the index-reuse problem. Callers can now build a MultiChromOverlapper once and reuse it across multiple query/set-algebra operations instead of rebuilding the AIList index every time.

  • Structural transforms (closest, cluster, union, subtract, etc.) moved from extension traits to inherent methods on RegionSet
  • MCO owns its source RegionSet and gets index-native IntervalSetOps plus query methods (subset_by, intersect_all, count_overlaps, etc.)
  • Deleted single-implementor traits IntervalRanges and RegionSetOverlaps
  • Fixed closest() bug where a fixed ±2 window missed long intervals
// One-off (builds index internally)
let result = region_set_a.intersect(&region_set_b);

// Reusable index (build once, query many)
let index = MultiChromOverlapper::from_region_set(reference, OverlapperType::AIList);
let r1 = index.intersect_all(&query1);
let j = index.jaccard(&query2);

This is not considered "finished" and "final", but more of a step forward. So, happy for things to continue to change. but his is helping me conceptualize this.

nsheff added 7 commits March 11, 2026 11:35
…tars-core

Migrates 10 structural methods (trim, shift, flank, resize, narrow,
promoters, reduce, concat, disjoin, gaps) plus pintersect as inherent
methods on RegionSet in gtars-core.

Introduces IntervalSetOps trait in core with sweep-line implementations
for RegionSet covering two-set operations: setdiff, intersect, union,
jaccard, coverage, overlap_coefficient, subtract, closest, cluster.

Strips IntervalRanges trait in genomicdist down to only intersect_all
(the one method requiring an overlap index). Moves SortedRegionSet to
core with re-export from genomicdist. Updates all downstream callers
(CLI, Python, WASM, R bindings, examples) to use the new locations.
…query methods

MCO now owns its source RegionSet via from_region_set() constructor,
eliminating the error-prone pattern of passing source as a parameter.
Implements IntervalSetOps trait (delegating to owned source) so callers
can write generic code over impl IntervalSetOps. Adds MCO-only query
methods: subset_by, count_overlaps, any_overlaps, find_overlaps_indexed,
intersect_all. Moves intersect_all from genomicdist to MCO (genomicdist
now delegates). Old method names kept as deprecated aliases.
Remove the RegionSetOverlaps extension trait which was redundant
indirection now that MultiChromOverlapper has inherent query methods.
Update Python bindings to build an MCO index explicitly instead of
calling trait methods on RegionSet.
Remove the now-empty IntervalRanges trait and interval_ranges.rs file.
The strand-aware StrandedRegionSet methods (promoters, reduce, setdiff)
move to inherent methods in models.rs alongside the struct definitions.
The intersect_all convenience on RegionSet is replaced by direct MCO
usage in downstream crates. All IntervalRanges imports removed from
CLI, WASM, Python, and R bindings.
Last remaining reference to the deleted IntervalRanges trait was a
test class name in gtars-python.
Resolve conflict: interval_ranges.rs deleted on this branch,
modified on dev (PR #244 closest() fix + R cast safety).
The closest() improvements are already in RegionSet::closest().
R/WASM safe-cast changes from dev merge cleanly.
@nsheff nsheff marked this pull request as ready for review March 11, 2026 19:25
@nsheff nsheff requested review from nleroy917 and sanghoonio March 11, 2026 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant