Skip to content

Commit bde7a3d

Browse files
authored
Merge pull request #51 from databio/dev
0.2.0 Release - Gibson Les Paul
2 parents bfcabdc + df443c2 commit bde7a3d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+1379
-516
lines changed

.github/workflows/R-CMD-check.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ jobs:
1919
# - {os: windows-latest, r: 'release', rust-version: 'stable-msvc', rust-target: 'x86_64-pc-windows-gnu'}
2020
- {os: macOS-latest, r: 'release', rust-version: 'stable'}
2121
- {os: ubuntu-latest, r: 'release', rust-version: 'stable'}
22-
- {os: ubuntu-latest, r: 'devel', rust-version: 'stable'}
22+
#- {os: ubuntu-latest, r: 'devel', rust-version: 'stable'}
2323
env:
2424
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
2525
steps:

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,4 @@ bin/
2525

2626
.DS_Store
2727
.Rhistory
28+
/gtars/tests/data/out/region_scoring_count.csv.gz

LICENSE

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Copyright 2024 gtars authors
2+
3+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
4+
5+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
6+
7+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
8+
9+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

LICENSE.txt

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Copyright 2024 gtars authors
2+
3+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
4+
5+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
6+
7+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
8+
9+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

+20-5
Original file line numberDiff line numberDiff line change
@@ -7,59 +7,73 @@
77

88
`gtars` is a rust crate that provides a set of tools for working with genomic interval data. Its primary goal is to provide processors for our python package, [`geniml`](https:github.com/databio/geniml), a library for machine learning on genomic intervals. However, it can be used as a standalone library for working with genomic intervals as well.
99

10-
`gtars` provides three things:
10+
`gtars` provides these things:
1111

1212
1. A rust library crate.
1313
2. A command-line interface, written in rust.
14-
3. A Python package that provides bindings to the rust library.
14+
3. A Python package that provides Python bindings to the rust library.
15+
4. An R package that provides R bindings to the rust library
1516

1617
## Repository organization (for developers)
1718

1819
This repo is organized like so:
1920

20-
1. A rust library crate (`/gtars/lib.rs`) that provides functions, traits, and structs for working with genomic interval data.
21-
2. A rust binary crate (in `/gtars/main.rs`), a small, wrapper command-line interface for the library crate.
22-
3. A rust crate (in `/bindings`) that provides Python bindings, and a resulting Python package, so that it can be used within Python.
21+
1. The main gtars rust package in `/gtars`, which contains two crates:
22+
1a. A rust library crate (`/gtars/lib.rs`) that provides functions, traits, and structs for working with genomic interval data.
23+
1b. A rust binary crate (in `/gtars/main.rs`), a small, wrapper command-line interface for the library crate.
24+
2. Python bindings (in `/bindings/python`), which consists of a rust package with a library crate (no binary crate) and Python package.
25+
3. R bindings (in `/bindinds/r`), which consists of an R package.
2326

2427
This repository is a work in progress, and still in early development.
2528

2629
## Installation
30+
2731
To install `gtars`, you must have the rust toolchain installed. You can install it by following the instructions [here](https://www.rust-lang.org/tools/install).
2832

2933
You may build the binary locally using `cargo build --release`. This will create a binary in `target/release/gtars`. You can then add this to your path, or run it directly.
3034

3135
## Usage
36+
3237
`gtars` is very early in development, and as such, it does not have a lot of functionality yet. However, it does have a few useful tools. To see the available tools, run `gtars --help`. To see the help for a specific tool, run `gtars <tool> --help`.
3338

3439
Alternatively, you can link `gtars` as a library in your rust project. To do so, add the following to your `Cargo.toml` file:
40+
3541
```toml
3642
[dependencies]
3743
gtars = { git = "https://github.com/databio/gtars" }
3844
```
3945

4046
## Testing
47+
4148
To run the tests, run `cargo test`.
4249

4350
## Contributing
51+
4452
### New internal library crate tools
53+
4554
If you'd like to add a new tool, you can do so by creating a new module within the src folder.
4655

4756
### New public library crate tools
57+
4858
If you want this to be available to users of `gtars`, you can add it to the `gtars` library crate as well. To do so, add the following to `src/lib.rs`:
4959
```rust
5060
pub mod <tool_name>;
5161
```
5262

5363
### New binary crate tools
64+
5465
Finally, if you want to have command-line functionality, you can add it to the `gtars` binary crate. This requires two steps:
66+
5567
1. Create a new `cli` using `clap` inside the `interfaces` module of `src/cli.rs`:
68+
5669
```rust
5770
pub fn make_new_tool_cli() -> Command {
5871

5972
}
6073
```
6174

6275
2. Write your logic in a wrapper function. This will live inside the `functions` module of `src/cli.rs`:
76+
6377
```rust
6478
// top of file:
6579
use tool_name::{ ... }
@@ -73,6 +87,7 @@ pub fn new_tool_wrapper() -> Result<(), Box<dyn Error>> {
7387
Please make sure you update the changelog and bump the version number in `Cargo.toml` when you add a new tool.
7488

7589
### VSCode users
90+
7691
If you are using VSCode, make sure you link to the `Cargo.toml` inside the `.vscode` folder, so that `rust-analyzer` can link it all together:
7792
```json
7893
{

bindings/python/Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "gtars-py"
3-
version = "0.1.1"
3+
version = "0.2.0"
44
edition = "2021"
55

66
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

bindings/python/README.md

+18-13
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,23 @@
11
# gtars
2-
This is a python wrapper around the `gtars` crate. It provides an easy interface for using `gtars` in python. It is currently in early development, and as such, it does not have a lot of functionality yet, but new tools are being worked on right now.
32

4-
## Installation
5-
You can get `gtars` from PyPI:
6-
```bash
7-
pip install gtars
8-
```
3+
This is a Python package that wraps the `gtars` crate so you can call gtars code from Python.
4+
5+
Documentation for Python bindings is hosted at: https://docs.bedbase.org/gtars/
6+
7+
## Brief instructions
98

10-
## Usage
11-
Import the package, and use the tools:
12-
```python
13-
import gtars as gt
9+
To install the development version, you'll have to build it locally. Build Python bindings like this:
1410

15-
gt.prune_universe(...)
11+
```console
12+
cd bindings/python
13+
maturin build --interpreter 3.11 --release
1614
```
17-
## Developer docs
18-
Write the develop docs here...
15+
16+
Then install the local wheel that was just built:
17+
18+
```console
19+
gtars_version=`grep '^version =' Cargo.toml | cut -d '"' -f 2`
20+
python_version=$(python --version | awk '{print $2}' | cut -d '.' -f1-2 | tr -d '.')
21+
wheel_path=$(find target/wheels/gtars-${gtars_version}-cp${python_version}-cp${python_version}-*.whl)
22+
pip install --force-reinstall ${wheel_path}
23+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .gtars.digests import * # noqa: F403

bindings/python/src/digests/mod.rs

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
// This is intended to provide minimal Python bindings to functions in the `digests` module of the `gtars` crate.
2+
3+
use pyo3::prelude::*;
4+
use gtars::digests::{sha512t24u, md5, DigestResult};
5+
6+
#[pyfunction]
7+
pub fn sha512t24u_digest(readable: &str) -> String {
8+
return sha512t24u(readable);
9+
}
10+
11+
#[pyfunction]
12+
pub fn md5_digest(readable: &str) -> String {
13+
return md5(readable);
14+
}
15+
16+
#[pyfunction]
17+
pub fn digest_fasta(fasta: &str) -> PyResult<Vec<PyDigestResult>> {
18+
match gtars::digests::digest_fasta(fasta) {
19+
Ok(digest_results) => {
20+
let py_digest_results: Vec<PyDigestResult> = digest_results.into_iter().map(PyDigestResult::from).collect();
21+
Ok(py_digest_results)
22+
},
23+
Err(e) => Err(PyErr::new::<pyo3::exceptions::PyIOError, _>(format!("Error processing FASTA file: {}", e))),
24+
}
25+
}
26+
27+
#[pyclass]
28+
#[pyo3(name="DigestResult")]
29+
pub struct PyDigestResult {
30+
#[pyo3(get,set)]
31+
pub id: String,
32+
#[pyo3(get,set)]
33+
pub length: usize,
34+
#[pyo3(get,set)]
35+
pub sha512t24u: String,
36+
#[pyo3(get,set)]
37+
pub md5: String
38+
}
39+
40+
#[pymethods]
41+
impl PyDigestResult {
42+
fn __repr__(&self) -> String {
43+
format!("<DigestResult for {}>", self.id)
44+
}
45+
46+
fn __str__(&self) -> PyResult<String> {
47+
Ok(format!("DigestResult for sequence {}\n length: {}\n sha512t24u: {}\n md5: {}", self.id, self.length, self.sha512t24u, self.md5))
48+
}
49+
}
50+
51+
impl From<DigestResult> for PyDigestResult {
52+
fn from(value: DigestResult) -> Self {
53+
PyDigestResult {
54+
id: value.id,
55+
length: value.length,
56+
sha512t24u: value.sha512t24u,
57+
md5: value.md5
58+
}
59+
}
60+
}
61+
62+
// This represents the Python module to be created
63+
#[pymodule]
64+
pub fn digests(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
65+
m.add_function(wrap_pyfunction!(sha512t24u_digest, m)?)?;
66+
m.add_function(wrap_pyfunction!(md5_digest, m)?)?;
67+
m.add_function(wrap_pyfunction!(digest_fasta, m)?)?;
68+
m.add_class::<PyDigestResult>()?;
69+
Ok(())
70+
}
71+

bindings/python/src/lib.rs

+4
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ mod ailist;
55
mod models;
66
mod tokenizers;
77
mod utils;
8+
mod digests;
89

910
pub const VERSION: &str = env!("CARGO_PKG_VERSION");
1011

@@ -14,11 +15,13 @@ fn gtars(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
1415
let ailist_module = pyo3::wrap_pymodule!(ailist::ailist);
1516
let utils_module = pyo3::wrap_pymodule!(utils::utils);
1617
let models_module = pyo3::wrap_pymodule!(models::models);
18+
let digests_module = pyo3::wrap_pymodule!(digests::digests);
1719

1820
m.add_wrapped(tokenize_module)?;
1921
m.add_wrapped(ailist_module)?;
2022
m.add_wrapped(utils_module)?;
2123
m.add_wrapped(models_module)?;
24+
m.add_wrapped(digests_module)?;
2225

2326
let sys = PyModule::import_bound(py, "sys")?;
2427
let binding = sys.getattr("modules")?;
@@ -29,6 +32,7 @@ fn gtars(py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
2932
sys_modules.set_item("gtars.ailist", m.getattr("ailist")?)?;
3033
sys_modules.set_item("gtars.utils", m.getattr("utils")?)?;
3134
sys_modules.set_item("gtars.models", m.getattr("models")?)?;
35+
sys_modules.set_item("gtars.digests", m.getattr("digests")?)?;
3236

3337
// add constants
3438
m.add("__version__", VERSION)?;

bindings/r/DESCRIPTION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: gtars
22
Title: Performance critical genomic interval analysis using Rust, in R
3-
Version: 0.0.0.9000
3+
Version: 0.0.1
44
Authors@R:
55
person("Nathan", "LeRoy", , "[email protected]", role = c("aut", "cre"),
66
comment = c(ORCID = "0000-0002-7354-7213"))

bindings/r/R/igd.R

+3-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ NULL
1818
#' @examples
1919
#' \dontrun{
2020
#' # Create database with default name
21-
#' igd_create("path/to/output", "path/to/bed/files")
21+
#' r_igd_create("path/to/output", "path/to/bed/files")
2222
#' }
2323
#'
2424
#' @export
@@ -49,6 +49,8 @@ r_igd_create <- function(output_path, filelist, db_name = "igd_database") {
4949
#'
5050
#' @examples
5151
#' \dontrun{
52+
#' # Search database with default name
53+
#' r_igd_search("path/to/database", "path/to/query/file")
5254
#' }
5355
#'
5456
#' @export

bindings/r/README.md

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# gtars
2+
3+
This is an R package that wraps the `gtars` Rust crate so you can call gtars code from R.
4+
5+
## Brief instructions
6+
7+
To install the development version, you'll have to build it locally. Build R bindings like this:
8+
9+
```console
10+
cd bindings
11+
R CMD build r
12+
```
13+
14+
Then install the package that was just built:
15+
16+
```console
17+
R CMD INSTALL gtars_0.0.1.tar.gz
18+
```

bindings/r/man/r_igd_create.Rd

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

bindings/r/man/r_igd_search.Rd

+2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

bindings/r/src/rust/Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = 'gtars-r'
3-
version = '0.1.0'
3+
version = '0.2.0'
44
edition = '2021'
55

66
[lib]

bindings/r/tests/set_A.bed

-7
This file was deleted.

bindings/r/tests/set_AA.bed

-3
This file was deleted.

0 commit comments

Comments
 (0)