Skip to content
Open
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
6f465f2
Add IntervalRanges trait with 5 GenomicRanges operations
sanghoonio Feb 19, 2026
9af97aa
Add partition system, statistics ports, and GTF gene model loading
sanghoonio Feb 20, 2026
13da1eb
Replace is_sorted flag with SortedRegionSet newtype
sanghoonio Feb 22, 2026
80b49b2
Remove benchmark example from git tracking
sanghoonio Feb 22, 2026
ff5ac7d
Add calcSummarySignal port (signal matrix overlap + boxplot stats)
sanghoonio Feb 22, 2026
6ba745e
Add test files and fasta index to gitignore
sanghoonio Feb 22, 2026
16385cf
Merge branch 'dev' into genomicdist_partitions
sanghoonio Feb 22, 2026
9dfc6c4
Add R bindings for genomicdist: statistics, partitions, signal, inter…
sanghoonio Feb 23, 2026
f40898e
Add WASM bindings for genomicdist functions
sanghoonio Feb 23, 2026
01736be
Add multi-BED set operations: concat, union, jaccard, consensus
sanghoonio Feb 23, 2026
4968756
Add tests for concat, union, jaccard, and consensus set operations
sanghoonio Feb 23, 2026
d0ded8a
Add genomicdist, ranges, and consensus CLI subcommands
sanghoonio Feb 24, 2026
3c67299
Bump gtars-wasm to 0.7.1 for npm publish with set operations
sanghoonio Feb 24, 2026
0dbc866
Merge genomicdist_partitions into dev
sanghoonio Feb 24, 2026
0856fab
Fix WASM npm publish: Node 24.x + provenance config
sanghoonio Feb 24, 2026
e2a788b
Fix WASM npm publish: remove registry-url to allow OIDC fallback
sanghoonio Feb 24, 2026
a59a116
Fix WASM npm publish: keep registry-url but strip auth token
sanghoonio Feb 24, 2026
7b3f125
Fix .npmrc strip: run before npm upgrade which removes the file
sanghoonio Feb 24, 2026
b1a872a
Fix npm publish: write .npmrc with registry only, no auth token
sanghoonio Feb 24, 2026
4e70fac
Restore registry-url, add OIDC debug logging
sanghoonio Feb 24, 2026
abb55c0
Fix npm publish: clear stale setup-node token before OIDC exchange
sanghoonio Feb 24, 2026
e3d3142
Fix npm publish: move setup-node after WASM build for fresh token
sanghoonio Feb 24, 2026
b697fde
add toggles to rust-publish
nsheff Feb 24, 2026
b64ae09
revamp publish action
nsheff Feb 24, 2026
f812f08
update workflows
nsheff Feb 24, 2026
08d64be
Merge branch 'master' into dev
sanghoonio Feb 24, 2026
42b1385
Add genomicdist CLI, prep command, and packed binary serialization
sanghoonio Feb 25, 2026
2cd731d
genomicdist: add --compact flag for pipeline JSON output
sanghoonio Feb 25, 2026
4793538
genomicdist: fix TSS distances and add promoter params
sanghoonio Feb 25, 2026
eeb9c07
ignore hidden subdir
nsheff Feb 26, 2026
b960304
udpate R bindings to match recent genomicdist change
nsheff Feb 26, 2026
aab0a78
fix bugs in openssl, and keys, for r bindings
nsheff Feb 26, 2026
6c5bd47
Merge branch 'dev' of github.com:databio/gtars into dev
nsheff Feb 28, 2026
b40edf1
add streaming uniwig alongside current batch parallel implementation
nsheff Feb 28, 2026
0b96513
genomicdist: add GDA binary format and simplify partition loading
sanghoonio Mar 3, 2026
60bdd71
Merge branch 'dev' of https://github.com/databio/gtars into dev
sanghoonio Mar 3, 2026
1960aed
Merge pull request #236 from databio/streaming-uniwig
nsheff Mar 6, 2026
4884bd9
Refgetstore updates (#237)
nsheff Mar 6, 2026
8faf033
Fix streaming output file extension and --dense help text
nsheff Mar 6, 2026
a92b464
version bumps; implement refgetstore sidecar access
nsheff Mar 6, 2026
8e11d8f
Fix bugs in genomicdist, improve test coverage
sanghoonio Mar 6, 2026
45da0f7
Add binary loading to R/WASM bindings, fix genomicdist edge cases, up…
sanghoonio Mar 6, 2026
78bd3e4
Fix StdoutLock lifetime issue in streaming mode
sanghoonio Mar 6, 2026
22ada89
Improve test coverage and exclude untestable bindings from codecov
sanghoonio Mar 6, 2026
bc95863
Add comprehensive Python bindings for gtars-genomicdist
sanghoonio Mar 6, 2026
a8ed777
Add RegionSet S4 class with strand support for R bindings
sanghoonio Mar 6, 2026
84d5875
Add strand support to Python RegionSet
sanghoonio Mar 6, 2026
ad4a1e9
Add unified Igd struct (in-memory query) and gtars-lola enrichment en…
sanghoonio Mar 8, 2026
2997c34
Add universe handling and RegionDB database loading to gtars-lola
sanghoonio Mar 8, 2026
9d9757c
Add R bindings for LOLA enrichment analysis
sanghoonio Mar 8, 2026
49f1670
Add Python bindings for LOLA enrichment analysis
sanghoonio Mar 8, 2026
5466dfb
Add FDR correction and output formatting to gtars-lola (Phase 7)
sanghoonio Mar 8, 2026
b83afd7
Fix three correctness bugs found via R LOLA side-by-side validation
sanghoonio Mar 8, 2026
d0f6aaa
Add WASM bindings for LOLA enrichment analysis
sanghoonio Mar 8, 2026
655303c
Rewrite tutorial_regionset.Rmd to mirror genomicdist tutorial with S4…
sanghoonio Mar 9, 2026
84bbdbc
Add interval transform and set operations to RegionSet
sanghoonio Mar 9, 2026
097a128
Merge branch 'dev' into gtars-lola
sanghoonio Mar 9, 2026
63c47ed
Add findOverlaps/countOverlaps to RegionSet via IGD index
sanghoonio Mar 9, 2026
9f57828
Fix 1-to-1 interval ops to preserve region count and strand
sanghoonio Mar 9, 2026
a9b169e
Merge branch 'dev' into gtars-lola
sanghoonio Mar 9, 2026
7841e72
Add RegionDB accessor functions for R bindings
sanghoonio Mar 9, 2026
76753b7
Expose more gtars functionality via python (#241)
nsheff Mar 10, 2026
6196762
Gitignore gtars-sc, tourney_suite, and runs directories
sanghoonio Mar 10, 2026
d41afd5
Merge branch 'dev' into gtars-lola
sanghoonio Mar 10, 2026
800d15e
Fix gaps leading gap + add CoordinateMode for R/BED coordinate conven…
sanghoonio Mar 10, 2026
89cd21c
Fix neighbor distance sorting + filtering to match R behavior
sanghoonio Mar 10, 2026
8fcc6eb
Add per-chromosome binning for calcChromBins to match R behavior
sanghoonio Mar 10, 2026
43ad016
Fix expected partition sizes to use priority-resolved coverage
sanghoonio Mar 10, 2026
db66e90
Fix loadRegionDB: description fallback + size column
sanghoonio Mar 10, 2026
dce2634
Fix calcExpectedPartitions: strand-aware promoters, trim, and raw bp …
sanghoonio Mar 11, 2026
28b9f1f
Add IGD/LOLA test coverage and fix gtars-py trait ambiguity
sanghoonio Mar 11, 2026
21bc688
Unify overlap traits: remove IGD from genomicdist, add min_overlap to…
sanghoonio Mar 11, 2026
f44cece
Remove accidentally tracked tourney_suite file (directory is gitignored)
sanghoonio Mar 11, 2026
bf85006
Fix iter_chroms ordering, S4 dispatch, LOLA odds ratio, and wasm bind…
sanghoonio Mar 11, 2026
364ae4c
Eliminate redundant region copy in loadRegionDB
sanghoonio Mar 11, 2026
0a2a68e
Add missing IntervalRanges subcommands to ranges CLI
sanghoonio Mar 11, 2026
c66deb6
Split store.rs into submodules
nsheff Mar 11, 2026
86c0989
Fix genomicdist bugs: closest() search, signal endianness, WASM panic…
nsheff Mar 11, 2026
9ceae0c
Fix KeepOurs sync reporting bug and Python binding unwrap panics
nsheff Mar 11, 2026
efae755
Add query methods to MultiChromOverlapper for index reuse
nsheff Mar 11, 2026
04addb8
Merge pull request #244 from databio/gdist-edits
sanghoonio Mar 11, 2026
6a1d14e
Merge dev into gtars-lola
sanghoonio Mar 11, 2026
716314c
Fix IGD/LOLA audit issues: contingency validation, width corruption, …
sanghoonio Mar 11, 2026
39696a3
Code quality: single-pass BED read, dead code removal, TSV format, do…
sanghoonio Mar 11, 2026
e26a07f
Refactor IGD: extract shared walk_tile_overlaps helper
sanghoonio Mar 11, 2026
46d1287
Fix audit issues: overflows, NaN handling, R type safety, dead code, …
sanghoonio Mar 11, 2026
08ff20b
Migrate CLI and R IGD bindings to unified Igd API
sanghoonio Mar 11, 2026
8d15e48
Add testthat infrastructure, pytest CI, and merge-readiness updates
sanghoonio Mar 12, 2026
0ecc9c8
Add old-vs-new IGD comparison tests using lola_multi_db dataset
sanghoonio Mar 12, 2026
a0645eb
Merge pull request #242 from databio/gtars-lola
sanghoonio Mar 12, 2026
351874e
Fix CI: set LD_LIBRARY_PATH so gtars-r tests can find libR.so
sanghoonio Mar 12, 2026
95119ef
Fix Python CI: bump requires-python to >=3.10, move test deps to dev …
sanghoonio Mar 12, 2026
238188a
Fix Python CI: install dev deps directly instead of editable install
sanghoonio Mar 12, 2026
9b7ace4
Fix pyproject.toml: move dynamic field back inside [project] table
sanghoonio Mar 12, 2026
efe0a99
Fix Python tests, add R tests to CI, gitignore uv.lock
sanghoonio Mar 12, 2026
3d4018c
Add RegionSetList type across all layers (core, lola, R, Python, WASM)
sanghoonio Mar 13, 2026
7d54d62
Move FHR sidecar files from collections/ to dedicated fhr/ subdirectory
nsheff Mar 13, 2026
9369eb5
LOLA parity fixes across R/Python/WASM bindings and enrichment engine
sanghoonio Mar 13, 2026
69f5a37
Fix buildRestrictedUniverse to use disjoin instead of reduce
sanghoonio Mar 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,31 @@ bin/
/.idea/gtars.iml
/gtars/tests/data/test1.bw

.claude/
.DS_Store
.Rhistory
tests/data/out/region_scoring_count.csv.gz
/gtars-refget/tests/store_test/rgstore.json
/gtars-refget/tests/store_test/sequences.rgsi

# Large benchmark data and validation files
tests/data/interval_ranges_benchmark/
tests/data/partitions_benchmark/
gtars-genomicdist/examples/benchmark_*.rs
gtars-genomicdist/examples/partitions_demo.rs
gtars-genomicdist/tests/
*.annotation.gtf.gz
tests/data/fasta/sacCer3.fa
tests/data/fasta/sacCer3.rgsi
gtars-r/tests/explore_sacCer3.*
gtars-r/tests/smoketest_sacCer3.R
gtars-r/tests/hg19.fa
gtars-r/tests/hg19.fa.gz
gtars-r/tests/benchmark_genomicdist.*
gtars-r/tests/compare_genomicdist.*
gtars-r/tests/tutorial_genomicdist.*
gtars-r/tests/*.bed
gtars-r/tests/*.html
gtars-r/tests/export_reference_data.R
gtars-r/tests/ref/
gtars-r/tests/smoketest_set_operations.*
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,5 @@ serde = { version = "1.0.203", features=["derive"] }
flate2 = "1.1.2"
indicatif = "0.18.0"
serde_json = "1.0.135"
bincode = "1.3.3"
byteorder = "1.5.0"
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,8 @@ wasm:
test:
cargo test --all --workspace -- --nocapture

test-r:
bash gtars-r/test-r.sh

fmt:
cargo fmt --all -- --check
55 changes: 53 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ print(__version__)

## Usage

`gtars` provides several useful tools. There are 3 ways to use `gtars`.
`gtars` provides several useful tools. There are 3 ways to use `gtars`.

### 1. From Python

Expand All @@ -75,6 +75,56 @@ Using bindings, you can call some `gtars` functions from within Python.

To see the available tools you can use from the CLI run `gtars --help`. To see the help for a specific tool, run `gtars <tool> --help`.

Available subcommands:

| Subcommand | Description |
|---|---|
| `genomicdist` | Compute genomic distribution statistics for a BED file |
| `prep` | Pre-serialize GTF gene models or signal matrices to binary for fast loading |
| `ranges` | Interval set algebra operations on BED files (reduce, trim, promoters, setdiff, pintersect, concat, union, jaccard) |
| `consensus` | Compute consensus regions across multiple BED files |

#### Preparing reference files

Pre-compile reference files to binary for fast repeated loading. This is optional but recommended when running `genomicdist` repeatedly against the same references.

```bash
# Pre-compile a GTF gene model
gtars prep --gtf gencode.v47.annotation.gtf.gz

# Pre-compile an open signal matrix
gtars prep --signal-matrix openSignalMatrix_hg38.txt
```

Output defaults to the input path with `.bin` appended (stripping `.gz` first). Use `-o` to specify a custom output path.

#### Computing genomic distributions

```bash
gtars genomicdist \
--bed query.bed \
--gtf gencode.v47.annotation.gtf.bin \
--tss tss.bed \
--chrom-sizes hg38.chrom.sizes \
--signal-matrix openSignalMatrix_hg38.txt.bin \
--output result.json
```

All flags except `--bed` are optional. Omit any flag to skip that analysis:

| Flag | Required | Description |
|---|---|---|
| `--bed` | yes | Input BED file |
| `--gtf` | no | GTF/GTF.gz or pre-compiled `.bin` — enables partitions and TSS distances |
| `--tss` | no | TSS BED file — overrides GTF-derived TSS for distance calculation |
| `--chrom-sizes` | no | Chrom sizes file — enables expected partitions |
| `--signal-matrix` | no | Signal matrix TSV or pre-compiled `.bin` — enables open chromatin enrichment |
| `--bins` | no | Number of bins for region distribution (default: 250) |
| `--promoter-upstream` | no | Upstream distance from TSS for promoter regions (default: 200) |
| `--promoter-downstream` | no | Downstream distance from TSS for promoter regions (default: 2000) |
| `--output` | no | Output JSON path (default: stdout) |
| `--compact` | no | Compact JSON output (default: pretty-printed) |

### 3. As a rust library

You can link `gtars` as a library in your rust project. To do so, add the following to your `Cargo.toml` file:
Expand All @@ -84,7 +134,8 @@ You can link `gtars` as a library in your rust project. To do so, add the follow
gtars = { git = "https://github.com/databio/gtars/gtars" }
```

we wall-off crates using features, so you will need to enable the features you want. For example, to use the `gtars` crate the overlap tool, you would do:
We wall off crates using features, so you will need to enable the features you want. For example, to use the overlap tool:

```toml
[dependencies]
gtars = { git = "https://github.com/databio/gtars/gtars", features = ["overlaprs"] }
Expand Down
11 changes: 9 additions & 2 deletions gtars-cli/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "gtars-cli"
version = "0.7.0"
version = "0.8.0"
edition = "2024"
description = "Performance critical tools for genomic interval analysis. This is the CLI"
homepage = "https://github.com/databio/gtars"
Expand All @@ -22,11 +22,17 @@ anyhow = { workspace = true }
gtars-scoring = { path = "../gtars-scoring", optional=true, version="0.5.1" }
gtars-fragsplit = { path = "../gtars-fragsplit", optional=true, version="0.5.0" }
gtars-igd = { path = "../gtars-igd", optional=true, version="0.5.1" }
gtars-uniwig = { path = "../gtars-uniwig", optional=true, version="0.7.0" }
gtars-uniwig = { path = "../gtars-uniwig", optional=true, version="0.8.0" }
gtars-overlaprs = { path = "../gtars-overlaprs", optional = true, version="0.5.1" }
gtars-bbcache = { path = "../gtars-bbcache", optional=true, version="0.5.3" }
gtars-genomicdist = { path = "../gtars-genomicdist", optional=true, version="0.6.0" }
gtars-core = { path = "../gtars-core", version="0.5.5", features=["bigbed", "http"] }

# serialization
serde = { workspace = true }
serde_json = "1"
bincode = { workspace = true }

[[bin]]
name = "gtars"
path = "src/main.rs"
Expand All @@ -39,3 +45,4 @@ bbcache = ["dep:gtars-bbcache"]
igd = ["dep:gtars-igd"]
fragsplit = ["dep:gtars-fragsplit"]
overlaprs = ["dep:gtars-overlaprs"]
genomicdist = ["dep:gtars-genomicdist"]
28 changes: 28 additions & 0 deletions gtars-cli/src/consensus/cli.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
use clap::{Arg, Command};

pub const CONSENSUS_CMD: &str = "consensus";

pub fn create_consensus_cli() -> Command {
Command::new(CONSENSUS_CMD)
.about("Compute consensus regions across multiple BED files. Outputs BED4 (chr, start, end, count).")
.arg(
Arg::new("beds")
.long("beds")
.required(true)
.num_args(2..)
.help("Two or more input BED files"),
)
.arg(
Arg::new("min-count")
.long("min-count")
.required(false)
.default_value("1")
.help("Minimum overlap count to include a region in output"),
)
.arg(
Arg::new("output")
.long("output")
.required(false)
.help("Output BED file (default: stdout)"),
)
}
70 changes: 70 additions & 0 deletions gtars-cli/src/consensus/handlers.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
use std::fs::File;
use std::io::{self, Write};
use std::path::Path;

use anyhow::{Context, Result};
use clap::ArgMatches;

use gtars_core::models::RegionSet;
use gtars_genomicdist::consensus::consensus;

pub fn run_consensus(matches: &ArgMatches) -> Result<()> {
let bed_paths: Vec<&String> = matches
.get_many::<String>("beds")
.expect("--beds is required")
.collect();

let min_count: u32 = matches
.get_one::<String>("min-count")
.unwrap()
.parse()
.context("--min-count must be a positive integer")?;

let output_path = matches.get_one::<String>("output");

// Load all BED files
let sets: Vec<RegionSet> = bed_paths
.iter()
.map(|p| {
RegionSet::try_from(p.as_str())
.map_err(|e| anyhow::anyhow!("Failed to load BED file {}: {}", p, e))
})
.collect::<Result<Vec<_>>>()?;

eprintln!("Computing consensus across {} BED files...", sets.len());

let regions = consensus(&sets);

// Filter by min-count
let filtered: Vec<_> = regions
.iter()
.filter(|r| r.count >= min_count)
.collect();

eprintln!(
"{} consensus regions ({} after --min-count {} filter)",
regions.len(),
filtered.len(),
min_count,
);

match output_path {
Some(p) => {
let mut file = File::create(Path::new(p))
.with_context(|| format!("Failed to create output file: {}", p))?;
for r in &filtered {
writeln!(file, "{}\t{}\t{}\t{}", r.chr, r.start, r.end, r.count)?;
}
eprintln!("Output written to {}", p);
}
None => {
let stdout = io::stdout();
let mut out = stdout.lock();
for r in &filtered {
writeln!(out, "{}\t{}\t{}\t{}", r.chr, r.start, r.end, r.count)?;
}
}
}

Ok(())
}
2 changes: 2 additions & 0 deletions gtars-cli/src/consensus/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pub mod cli;
pub mod handlers;
66 changes: 66 additions & 0 deletions gtars-cli/src/genomicdist/cli.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
use clap::{Arg, Command, arg};

pub const GENOMICDIST_CMD: &str = "genomicdist";

pub fn create_genomicdist_cli() -> Command {
Command::new(GENOMICDIST_CMD)
.about("Compute genomic distribution statistics for a BED file.")
.arg(
arg!(--bed <BED>)
.required(true)
.help("Path to input BED file"),
)
.arg(
arg!(--gtf <GTF>)
.required(false)
.help("Path to GTF/GTF.gz gene model (enables partitions; also TSS distances if no --tss)"),
)
.arg(
arg!(--tss <TSS>)
.required(false)
.help("Path to TSS BED file (overrides GTF-derived TSS for distance calculation)"),
)
.arg(
Arg::new("chrom-sizes")
.long("chrom-sizes")
.required(false)
.help("Path to chrom.sizes file (enables expected partitions and promoter trimming)"),
)
.arg(
arg!(--output <OUTPUT>)
.required(false)
.help("Output JSON path (default: stdout)"),
)
.arg(
arg!(--bins <BINS>)
.required(false)
.default_value("250")
.help("Number of bins for region distribution"),
)
.arg(
Arg::new("signal-matrix")
.long("signal-matrix")
.required(false)
.help("Path to open signal matrix TSV (enables cell-type open chromatin enrichment)"),
)
.arg(
Arg::new("promoter-upstream")
.long("promoter-upstream")
.required(false)
.default_value("200")
.help("Upstream distance (bp) from TSS to define promoter regions"),
)
.arg(
Arg::new("promoter-downstream")
.long("promoter-downstream")
.required(false)
.default_value("2000")
.help("Downstream distance (bp) from TSS to define promoter regions"),
)
.arg(
Arg::new("compact")
.long("compact")
.action(clap::ArgAction::SetTrue)
.help("Compact JSON output (default: pretty-printed)"),
)
}
Loading