Skip to content

Commit 5c14272

Browse files
mandysullisammysheepnbx0samcwileySam Wiley
authored
Dev (#28)
* updating muts of int to include nucletoide differences in addition to aa differences with help from @samcwiley * add missing aa logic back in * updating to obtain partial codon info as well * Some refactoring ideas * updating seq same length logic to match seqs aligned logic * adding amino acid "X" handling * created a hamming distance package that measures hamm dist of all of the seqs within a fasta * creating a package that compares all of the seqs within a fasta file and give the positions and nucleotide diffs btw them all * updating documentation * adding in tail handlilng logic * updating docs * start containerization * improving Dockerfile * update docker compose * init GitHub Actions workflow for building binaries across multiple targets * refactor GitHub Actions workflow to use OS matrix for builds and rust nightly * bug * bug * Update GitHub Actions workflow to trigger on all branches except main * bug * Strip images to keep them small * Use multi-stage build with latest nightly alpine image * Add check_chemistry for Illumina and ONT to replace python scripts in MIRA-NF (#13) Adds module for argument parsing/handling and selecting proper configuration filepaths and IRMA modules. --------- Co-authored-by: Sam Wiley <dzw2@cdp-client-02.biotech.cdc.gov> * add check_chemistry * tweaking files for docker * updating output * changed package name * adding documentation for adding a package to the workspace * Updating docs * updating docs * restructing readme situation * fix picture * fixing link and finishing up * fix link * link test * fix link * fix typo * fix pic * fix format * arguments and subplots per segment * subplots and variant data * Add consistent color generation for segment names in plots * read flow Sankey diagram * Refactor subplot layout and enhance annotations for segment names in coverage plots * Update .gitignore to exclude lock files * Fix typo in README and update package addition steps for clarity * adding mutation_of_interest package * compress logic and sest up structs for Serde * Reading in info with serde and structs * Remove some clones * removing lock file and adding to .gitignore * Updating docs * updating docs * restructing readme situation * fix pic * fix format * update documentation * typo fixes and adding output structure * fix header size * created a hamming distance package that measures hamm dist of all of the seqs within a fasta * creating a package that compares all of the seqs within a fasta file and give the positions and nucleotide diffs btw them all * updating documentation * Some refactoring ideas * updating seq same length logic to match seqs aligned logic * adding in tail handlilng logic * updating docs * init GitHub Actions workflow for building binaries across multiple targets * refactor GitHub Actions workflow to use OS matrix for builds and rust nightly * bug * bug * Update GitHub Actions workflow to trigger on all branches except main * bug * Update .gitignore to exclude all lock files recursively * Refactor plot coverage segment function to remove unused variable and simplify fallback formulas * mira-oxide logo * smaller logo * Update Cargo.toml and main.rs: comment out file count increment and enhance filename formatting for Sankey plot * Enhance Sankey plot: add hover templates and update title for clarity * Update Cargo.toml and main.rs: switch plotly dependency to Git; enhance argument help descriptions; explicit node x,y * Remove unused Label import from plotly common * Update plotly dependency to use Git repository * Remove mutations_of_interest_table package and update workspace members * merge cleanup -- plots working * update docs * tweaking package to include subtype as a filter when filtering variants of interest (#24) --------- Co-authored-by: Samuel Shepard <vfn4@cdc.gov> Co-authored-by: Ben Rambo-Martin <nbx0@cdc.gov> Co-authored-by: Ben Rambo-Martin <39743838+nbx0@users.noreply.github.com> Co-authored-by: Samuel Wiley <26017589+samcwiley@users.noreply.github.com> Co-authored-by: Sam Wiley <dzw2@cdp-client-02.biotech.cdc.gov>
1 parent 406c1bf commit 5c14272

File tree

25 files changed

+2277
-418
lines changed

25 files changed

+2277
-418
lines changed
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
name: Build Binaries
2+
3+
on:
4+
push:
5+
branches:
6+
- '**'
7+
- '!main'
8+
pull_request:
9+
branches:
10+
- '**'
11+
- '!main'
12+
13+
jobs:
14+
build:
15+
runs-on: ${{ matrix.os }}
16+
strategy:
17+
matrix:
18+
include:
19+
- os: ubuntu-latest
20+
target: x86_64-unknown-linux-gnu
21+
- os: macos-latest
22+
target: aarch64-apple-darwin
23+
- os: windows-latest
24+
target: x86_64-pc-windows-msvc
25+
26+
steps:
27+
- uses: actions/checkout@v4
28+
29+
- name: Install Rust toolchain -- nightly
30+
run: |
31+
rustup update nightly
32+
rustup default nightly
33+
rustup target add ${{ matrix.target }}
34+
35+
# - name: Install build dependencies
36+
# if: matrix.target == 'x86_64-unknown-linux-gnu'
37+
# run: |
38+
# sudo apt-get update
39+
# sudo apt-get install -y build-essential pkg-config libssl-dev
40+
41+
- name: Build for ${{ matrix.target }}
42+
uses: actions-rs/cargo@v1
43+
with:
44+
command: build
45+
args: --release --target ${{ matrix.target }}
46+
47+
- name: Upload artifacts
48+
uses: actions/upload-artifact@v4
49+
with:
50+
name: binaries-${{ matrix.target }}
51+
path: target/${{ matrix.target }}/release/
52+
53+
name: Build Binaries
54+
55+
on:
56+
push:
57+
branches:
58+
- '**'
59+
- '!main'
60+
pull_request:
61+
branches:
62+
- '**'
63+
- '!main'
64+
65+
jobs:
66+
build:
67+
runs-on: ${{ matrix.os }}
68+
strategy:
69+
matrix:
70+
include:
71+
- os: ubuntu-latest
72+
target: x86_64-unknown-linux-gnu
73+
- os: macos-latest
74+
target: aarch64-apple-darwin
75+
- os: windows-latest
76+
target: x86_64-pc-windows-msvc
77+
78+
steps:
79+
- uses: actions/checkout@v4
80+
81+
- name: Install Rust toolchain -- nightly
82+
run: |
83+
rustup update nightly
84+
rustup default nightly
85+
rustup target add ${{ matrix.target }}
86+
87+
# - name: Install build dependencies
88+
# if: matrix.target == 'x86_64-unknown-linux-gnu'
89+
# run: |
90+
# sudo apt-get update
91+
# sudo apt-get install -y build-essential pkg-config libssl-dev
92+
93+
- name: Build for ${{ matrix.target }}
94+
uses: actions-rs/cargo@v1
95+
with:
96+
command: build
97+
args: --release --target ${{ matrix.target }}
98+
99+
- name: Upload artifacts
100+
uses: actions/upload-artifact@v4
101+
with:
102+
name: binaries-${{ matrix.target }}
103+
path: target/${{ matrix.target }}/release/

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@
77
/test
88
*test*
99
*lock
10+
**/*lock
11+
*.crt

.vscode/settings.json

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,11 @@
44
"CFPB",
55
"ISSO",
66
"cybersecurity"
7-
]
8-
}
7+
],
8+
9+
"[rust]": {
10+
"editor.defaultFormatter": "rust-lang.rust-analyzer",
11+
"editor.formatOnSave": true,
12+
},
13+
}
14+

Cargo.toml

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,24 @@
11
[workspace]
22

3-
members = [ "flu_prepare_irma_json",
4-
"mutations_of_interest_table", "plots",
5-
]
3+
4+
members = [ "all_sample_hamming_dist", "all_sample_nt_diffs",
5+
"variants_of_interest_table", "plots", "check_chemistry", "add_flu_prepare_irma_json"]
66

77
[workspace.dependencies]
88
clap = { version = "4", features = ["derive"] }
99
csv = "1.3.1"
1010
either = "1"
1111
serde = { version = "1.0.219", features = ["derive"] }
1212
serde_yaml = "0.9"
13+
glob = "0.3.2"
14+
ordered-float = "5.0.0"
15+
#plotly = "0.12.1"
16+
plotly = { git = "https://github.com/plotly/plotly.rs.git", branch = "main" }
17+
1318

14-
zoe = { git = "https://github.com/CDCgov/zoe.git", tag = "v0.0.18", default-features = false, features = [
19+
zoe = { version = "0.0.19", default-features = false, features = [
1520
"multiversion",
1621
] }
22+
23+
[profile.release]
24+
strip = true

Dockerfile

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Create an argument to pull a particular version of an image
2+
3+
####################################################################################################
4+
# BASE IMAGE
5+
####################################################################################################
6+
FROM rustlang/rust:nightly-alpine AS builder
7+
8+
# Required certs for apk update
9+
COPY ca.crt /root/ca.crt
10+
11+
# Put certs in /etc/ssl/certs location
12+
RUN cat /root/ca.crt >> /etc/ssl/certs/ca-certificates.crt
13+
14+
RUN apk update && apk add --no-cache build-base
15+
16+
WORKDIR /app
17+
18+
# Copy all scripts to docker images
19+
COPY . .
20+
21+
# This build step will cache the dependencies
22+
RUN cargo build --release
23+
24+
FROM alpine:latest as deploy
25+
26+
WORKDIR /app
27+
28+
COPY --from=builder \
29+
/app/target/release/mutations_of_interest_table \
30+
/app/target/release/all_sample_nt_diffs \
31+
/app/target/release/all_sample_hamming_dist \
32+
/app/target/release/plots \
33+
/app/target/release/check_chemistry /app/
34+
35+
# Create working directory variable
36+
ENV WORKDIR=/data
37+
38+
# Set up volume directory in docker
39+
VOLUME ${WORKDIR}
40+
41+
# Export project directory to PATH
42+
ENV PATH "$PATH:/app"

README.md

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# MIRA-Oxide
2+
![](./assets/images/mira_logo_gemini_oxide_web_small.png)
23

34
**General disclaimer** This repository was created for use by CDC programs to collaborate on public health related projects in support of the [CDC mission](https://www.cdc.gov/about/cdc/#cdc_about_cio_mission-our-mission). GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.
45

@@ -12,7 +13,7 @@ The material embodied in this software is provided to you "as-is" and without wa
1213

1314
MIRA-Oxide is a RUST workspace that is utilized by [MIRA-NF](https://github.com/CDCgov/MIRA-NF) to perform assembly and annotation of Influenza genomes, SARS-CoV-2 genomes, the SARS-CoV-2 spike-gene and RSV genomes.
1415

15-
## Adding New Package to the MIRA-Oxide Worksace
16+
## Adding New Package to the MIRA-Oxide Workspace
1617

1718
Before starting be sure that you have rust nightly installed and set as default. You will also need to have Cargo installed. If you need more information about how to install those, [see here](https://rust-book.cs.brown.edu/ch01-00-getting-started.html).
1819

@@ -37,6 +38,20 @@ git checkout -b add_new_package_name
3738

3839
### Step 3
3940

41+
Mira-oxide requires rust-nightly to run the [zoe](https://github.com/CDCgov/zoe) crate.
42+
43+
Install the nightly version of rust.
44+
45+
```
46+
rustup toolchain install nightly
47+
```
48+
Using nightly for mira-oxide
49+
50+
```
51+
rustup override set nightly
52+
```
53+
54+
### Step 4
4055
Create a new package using Cargo
4156

4257
```
@@ -46,6 +61,15 @@ cargo new new_package_name
4661
A folder with a the name that you specified should have been created. Inside that folder there should be a Cargo.toml file and a src folder containing a main.rs file.
4762

4863
### Step 4
64+
Create a new package using Cargo
65+
66+
```
67+
cargo new new_package_name
68+
```
69+
70+
A folder with a the name that you specified should have been created. Inside that folder there should be a Cargo.toml file and a src folder containing a main.rs file.
71+
72+
### Step 5
4973

5074
Start Working!
5175

@@ -59,7 +83,7 @@ cd new_package_name
5983

6084
Add your dependencies to the package's Cargo.toml and start editing your src/main.rs
6185

62-
### Step 5
86+
### Step 6
6387

6488
Run your program!
6589

@@ -69,16 +93,18 @@ To be sure that your package is working within the workspace go the workspace ar
6993
cargo run -p new_package_name -- #any inputs needed to run your package
7094
```
7195

72-
### Step 6
96+
### Step 7
7397

7498
Provide usage documentation.
7599

76-
Create a README.md witin your package folder. Within that README provide a descritpion of the package, it's inputs, it's outputs and how to execute the package.
100+
Create a README.md within your package folder. Within that README provide a description of the package, it's inputs, it's outputs and how to execute the package.
77101

78102
For additional information on rust workspaces, [see here](https://rust-book.cs.brown.edu/ch14-03-cargo-workspaces.html).
79103

80104
## Current Packages
81105
- [mutations_of_interest_table](mutations_of_interest_table/)
106+
- [all_sample_hamming_dist](all_sample_hamming_dist/)
107+
- [all_sample_nt_diffs](all_sample_nt_diffs/)
82108

83109
## Public Domain Standard Notice
84110
This repository constitutes a work of the United States Government and is not

all_sample_hamming_dist/Cargo.toml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
[package]
2+
name = "all_sample_hamming_dist"
3+
version = "0.1.0"
4+
edition = "2024"
5+
6+
[dependencies]
7+
clap.workspace = true
8+
either.workspace = true
9+
zoe.workspace = true

all_sample_hamming_dist/README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# All Sample Hamming Distance
2+
3+
The all_sample_nt_diffs package takes a fasta file containing all your samples of interest after they have been aligned as an input. The outputs is a hamming distance matrix that provides the hamming distance between all of the sequences within the fasta file provided.
4+
5+
### FASTA file Input:
6+
7+
```
8+
>sample-1-rep-1
9+
ATGGAGAGAATAAAAGAACTGAGAGATCTAATGTCACAGTCTCGCACTCGCGAGATACTA
10+
ACCAAAACCACTGTTGACCACATGGCCATAATCAAGAAGTACACATCAGGAAGACAAGAA
11+
>sample-1-rep-2
12+
ATGGAGAGAATAAAAGAACTGAGAGATCTAATGTTACAGTCTCGCACTCGCGAGATACTA
13+
ACCAAAACCACTGTTGACCACATGGCCATAATCAAGAAGTACACATCAGGAAGACAAGAA
14+
>sample-1-rep-3
15+
ATGGAGAGAATAAAAGAACTGAGAGATCTAATGTCACAGTCTCGCACTCGCGAGATACTA
16+
ACCAAAACCACTGTTGACCACATGGCCATAATCAAGAAGTACACATCAGGAAGACCTGAA
17+
```
18+
19+
After cloning the mira-oxide repo, execute this command to create a hamming distance matrix for the samples provided:
20+
21+
```
22+
cargo run -p all_sample_hamming_dist -- -i <PATH>/input.fasta -o <PATH>/outputs.csv
23+
```
24+
25+
If you would like the output to have another deliminator (default: ","), then the `-d` flag can be used to pass another deliminator.
26+
27+
### The hamming distances output should be structured like this:
28+
29+
```
30+
seqeunces,sample-1-rep-1,sample-1-rep-2,sample-1-rep-3
31+
sample-1-rep-1, 0, 1, 2
32+
sample-1-rep-2, 1, 0, 3
33+
sample-1-rep-3, 2, 3, 0
34+
```

0 commit comments

Comments
 (0)