Skip to content

Commit ddcfd69

Browse files
authored
Merge pull request #16 from ewels/docs
Website
2 parents 996e6ba + 9704e29 commit ddcfd69

56 files changed

Lines changed: 7108 additions & 50 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/docs.yml

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
name: Deploy website
2+
3+
on:
4+
push:
5+
branches: [main]
6+
paths:
7+
- 'docs/**'
8+
- '.github/workflows/docs.yml'
9+
workflow_dispatch:
10+
11+
permissions:
12+
contents: read
13+
pages: write
14+
id-token: write
15+
16+
concurrency:
17+
group: pages
18+
cancel-in-progress: true
19+
20+
jobs:
21+
build:
22+
runs-on: ubuntu-latest
23+
defaults:
24+
run:
25+
working-directory: docs
26+
steps:
27+
- name: Checkout
28+
uses: actions/checkout@v4
29+
30+
- name: Install pnpm
31+
uses: pnpm/action-setup@v4
32+
33+
- name: Setup Node
34+
uses: actions/setup-node@v4
35+
with:
36+
node-version: 22
37+
cache: pnpm
38+
cache-dependency-path: docs/pnpm-lock.yaml
39+
40+
- name: Install dependencies
41+
run: pnpm install --frozen-lockfile
42+
43+
- name: Build site
44+
run: pnpm build
45+
46+
- name: Upload artifact
47+
uses: actions/upload-pages-artifact@v3
48+
with:
49+
path: docs/dist
50+
51+
deploy:
52+
needs: build
53+
runs-on: ubuntu-latest
54+
environment:
55+
name: github-pages
56+
url: ${{ steps.deployment.outputs.page_url }}
57+
steps:
58+
- name: Deploy to GitHub Pages
59+
id: deployment
60+
uses: actions/deploy-pages@v4

CLAUDE.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Always run `cargo clippy`, `cargo fmt --check`, and `cargo test` before consider
3232

3333
## Current Status
3434

35-
**396 tests passing, 0 clippy warnings.** SE: 8613/8926 compare_sam.py (96.5%; note: lower due to seeded-RNG tie-break PR diverging from STAR's mt19937), **99.815% faithfulness (tie-adjusted)** (8611/8627 non-tie reads exact), 299 tie-breaking diffs excluded. 1 CIGAR-only disagree (ERR12389696.13573895, insertion placement, seed-level tie). **0 STAR-only / 0 rustar-aligner-only SE reads**. PE: **8390 both-mapped** (STAR: 8390), **0 half-mapped**, 0 MAPQ inflations / 0 deflations, **99.883% PE exact faithfulness (tie-adjusted)** (16284/16306, 475 tie-breaking diffs excluded), **0 proper-pair diffs**, **0 NH diffs**. Phase 17.A: `scoreSeedBest` pre-extension. Phase 17.B: per-mate seeding. Phase 17.C: STAR-faithful SCORE-GATE + mappedFilter. Phase 17.D: combined-span penalty fix + dedup ordering. Phase 17.8: `--quantMode GeneCounts`. Phase E fix (2026-04-21): mate_id-aware diagonal dedup. Phase E2 (2026-04-22): STAR-faithful combined-read seeding. Phase E3 (2026-04-22): combined-threshold half-mapped fallback. Phase E4 (2026-04-22): PE-CHECK2 unconditional. Phase E5 (2026-04-23): split_combined_wt n_mismatch propagation. Phase E6 (2026-04-24): tie-adjusted faithfulness metric in assess_faithfulness.py. Phase F1: --runRNGseed + seeded primary tie-break (PR #5). Phase F2: --outSAMattrRGline (PR #6). Phase F3: --quantMode TranscriptomeSAM (PR #7). Phase F4: SJDB insertion into Genome+SA at genomeGenerate (PR #8). Phase G1 (2026-04-29): junction_shifts fix in split_combined_wt (rDNA cross-copy false-splice filter). Phase G2 (2026-04-29): MAX_RECURSION 10k→100k + sa_pos_to_forward overflow fix (ERR12389696.7118031 NH=3→9). Phase 17.2 (2026-04-29): coordinate-sorted BAM output (`--outSAMtype BAM SortedByCoordinate` → `Aligned.sortedByCoord.out.bam`). Phase 17.4 (2026-04-29): `--outReadsUnmapped Fastx` → `Unmapped.out.mate1` / `Unmapped.out.mate2`; writes unmapped + TooManyLoci reads; PE writes both mates for fully-unmapped and half-mapped pairs. Phase 17.6 (2026-05-01): `--outStd SAM/BAM_Unsorted/BAM_SortedByCoordinate` — routes primary alignment output to stdout via `Box<dyn AlignmentWriter>` trait dispatch; `SamStdoutWriter`, `BamStdoutWriter`, `SortedBamStdoutWriter` in sam.rs/bam.rs; verified with samtools pipe (967 records). Phase G3 (2026-05-01): SA tie-breaking fix — `compare_suffixes` tie-breaker changed from `pos_b.cmp(&pos_a)` to `packed_a.cmp(&packed_b)` (ascending by packed SA value with strand bit); rustar-aligner SA is now **byte-for-byte identical** to STAR's SA for the yeast genome (10,862 → 0 entry diffs). diff AS: 6→4 cases (4 remaining are rustar-aligner improvements: .844151 VIII 0mm vs STAR VII 6mm, .4972950 spliced vs unspliced mate2). Phase 17.3 (2026-05-01): PE chimeric detection — `detect_inter_mate_chimeric` in `chimeric/detect.rs`; intra-mate multi-cluster chimeric via cluster splitting + mate2 read_pos adjustment; inter-mate chimeric for discordant pairs (diff chr, same strand, or >1Mb); `align_paired_read` returns 4-tuple including `Vec<ChimericAlignment>`; no benchmark regression (8390 both-mapped, 0 half-mapped). Phase 17.11 (2026-05-01): `--chimOutType WithinBAM` — chimeric alignments written as supplementary records (FLAG 0x800) in primary BAM; donor record has full SEQ + SA tag; acceptor has FLAG 0x800 + SA tag + empty SEQ; `build_within_bam_records` in `chimeric/output.rs`; `chim_out_junctions()` / `chim_out_within_bam()` helpers in params.rs; supports mixed `--chimOutType Junctions WithinBAM`. Phase 17.7 (2026-05-01): GTF tag parameters — `--sjdbGTFchrPrefix`, `--sjdbGTFfeatureExon`, `--sjdbGTFtagExonParentTranscript`, `--sjdbGTFtagExonParentGene`; `_configured` variants in `junction/gtf.rs`, `quant/mod.rs`, `quant/transcriptome.rs`, `junction/mod.rs`; all 4 production paths thread params; backward-compat wrappers preserve zero test disruption. Phase 17.9 (2026-05-01): `--outBAMcompression` (BGZF level -1–9, default 1; -1/0=NONE, 1-8=flate2 levels, ≥9=BEST) + `--limitBAMsortRAM` (bytes, 0=unlimited; aborts sort if ~400 bytes/record estimate exceeds limit); `bgzf_compression()` + `make_bgzf_writer()` helpers in `io/bam.rs`; threaded through all 4 BAM writers (unsorted file, sorted file, unsorted stdout, sorted stdout). PE chimericDetectionOld (2026-05-01): per-mate `detect_chimeric_old` called on `all_m1_transcripts` / `all_m2_transcripts` pools after `filter_paired_transcripts` in `read_align.rs`. Phase 17.12 (2026-05-01): BySJout disk buffering — `BySJReadMeta` struct + `NamedTempFile` SAM temp file replaces `Vec<AlignmentBatchResults>`; `create_bysj_writer` / `bysj_write_records` / `bysj_read_n_records` helpers in `io/sam.rs`; `tempfile` moved to `[dependencies]`. Phase 17.13 (2026-05-01): 8 integration tests in `tests/alignment_features.rs` — synthetic 20kb genome with planted GT-AG intron; tests cover BAM output, PE alignment, spliced reads, BySJout, GeneCounts, unmapped output, two-pass mode. Phase 12.2 (2026-05-04): SE chimeric Tier 1b soft-clip re-mapping — `detect_from_soft_clips` in `chimeric/detect.rs` re-seeds the primary alignment's soft-clipped bases when `detect_chimeric_old` finds no partner; `adjust_read_positions` helper shifts sub-seq coords into full-read space for right clips; called as Step 3c in `read_align.rs`. Phase 17.10 (2026-05-04): Chimeric Tier 3 — `detect_from_chimeric_residuals` in `chimeric/detect.rs` re-seeds outer uncovered read regions (before donor / after acceptor) of each found chimeric pair; enables 3-way gene-fusion detection; called as Step 3d in `read_align.rs`. See [ROADMAP.md](ROADMAP.md) for detailed phase tracking and [docs/](docs/) for per-phase notes.
35+
**396 tests passing, 0 clippy warnings.** SE: 8613/8926 compare_sam.py (96.5%; note: lower due to seeded-RNG tie-break PR diverging from STAR's mt19937), **99.815% faithfulness (tie-adjusted)** (8611/8627 non-tie reads exact), 299 tie-breaking diffs excluded. 1 CIGAR-only disagree (ERR12389696.13573895, insertion placement, seed-level tie). **0 STAR-only / 0 rustar-aligner-only SE reads**. PE: **8390 both-mapped** (STAR: 8390), **0 half-mapped**, 0 MAPQ inflations / 0 deflations, **99.883% PE exact faithfulness (tie-adjusted)** (16284/16306, 475 tie-breaking diffs excluded), **0 proper-pair diffs**, **0 NH diffs**. Phase 17.A: `scoreSeedBest` pre-extension. Phase 17.B: per-mate seeding. Phase 17.C: STAR-faithful SCORE-GATE + mappedFilter. Phase 17.D: combined-span penalty fix + dedup ordering. Phase 17.8: `--quantMode GeneCounts`. Phase E fix (2026-04-21): mate_id-aware diagonal dedup. Phase E2 (2026-04-22): STAR-faithful combined-read seeding. Phase E3 (2026-04-22): combined-threshold half-mapped fallback. Phase E4 (2026-04-22): PE-CHECK2 unconditional. Phase E5 (2026-04-23): split_combined_wt n_mismatch propagation. Phase E6 (2026-04-24): tie-adjusted faithfulness metric in assess_faithfulness.py. Phase F1: --runRNGseed + seeded primary tie-break (PR #5). Phase F2: --outSAMattrRGline (PR #6). Phase F3: --quantMode TranscriptomeSAM (PR #7). Phase F4: SJDB insertion into Genome+SA at genomeGenerate (PR #8). Phase G1 (2026-04-29): junction_shifts fix in split_combined_wt (rDNA cross-copy false-splice filter). Phase G2 (2026-04-29): MAX_RECURSION 10k→100k + sa_pos_to_forward overflow fix (ERR12389696.7118031 NH=3→9). Phase 17.2 (2026-04-29): coordinate-sorted BAM output (`--outSAMtype BAM SortedByCoordinate` → `Aligned.sortedByCoord.out.bam`). Phase 17.4 (2026-04-29): `--outReadsUnmapped Fastx` → `Unmapped.out.mate1` / `Unmapped.out.mate2`; writes unmapped + TooManyLoci reads; PE writes both mates for fully-unmapped and half-mapped pairs. Phase 17.6 (2026-05-01): `--outStd SAM/BAM_Unsorted/BAM_SortedByCoordinate` — routes primary alignment output to stdout via `Box<dyn AlignmentWriter>` trait dispatch; `SamStdoutWriter`, `BamStdoutWriter`, `SortedBamStdoutWriter` in sam.rs/bam.rs; verified with samtools pipe (967 records). Phase G3 (2026-05-01): SA tie-breaking fix — `compare_suffixes` tie-breaker changed from `pos_b.cmp(&pos_a)` to `packed_a.cmp(&packed_b)` (ascending by packed SA value with strand bit); rustar-aligner SA is now **byte-for-byte identical** to STAR's SA for the yeast genome (10,862 → 0 entry diffs). diff AS: 6→4 cases (4 remaining are rustar-aligner improvements: .844151 VIII 0mm vs STAR VII 6mm, .4972950 spliced vs unspliced mate2). Phase 17.3 (2026-05-01): PE chimeric detection — `detect_inter_mate_chimeric` in `chimeric/detect.rs`; intra-mate multi-cluster chimeric via cluster splitting + mate2 read_pos adjustment; inter-mate chimeric for discordant pairs (diff chr, same strand, or >1Mb); `align_paired_read` returns 4-tuple including `Vec<ChimericAlignment>`; no benchmark regression (8390 both-mapped, 0 half-mapped). Phase 17.11 (2026-05-01): `--chimOutType WithinBAM` — chimeric alignments written as supplementary records (FLAG 0x800) in primary BAM; donor record has full SEQ + SA tag; acceptor has FLAG 0x800 + SA tag + empty SEQ; `build_within_bam_records` in `chimeric/output.rs`; `chim_out_junctions()` / `chim_out_within_bam()` helpers in params.rs; supports mixed `--chimOutType Junctions WithinBAM`. Phase 17.7 (2026-05-01): GTF tag parameters — `--sjdbGTFchrPrefix`, `--sjdbGTFfeatureExon`, `--sjdbGTFtagExonParentTranscript`, `--sjdbGTFtagExonParentGene`; `_configured` variants in `junction/gtf.rs`, `quant/mod.rs`, `quant/transcriptome.rs`, `junction/mod.rs`; all 4 production paths thread params; backward-compat wrappers preserve zero test disruption. Phase 17.9 (2026-05-01): `--outBAMcompression` (BGZF level -1–9, default 1; -1/0=NONE, 1-8=flate2 levels, ≥9=BEST) + `--limitBAMsortRAM` (bytes, 0=unlimited; aborts sort if ~400 bytes/record estimate exceeds limit); `bgzf_compression()` + `make_bgzf_writer()` helpers in `io/bam.rs`; threaded through all 4 BAM writers (unsorted file, sorted file, unsorted stdout, sorted stdout). PE chimericDetectionOld (2026-05-01): per-mate `detect_chimeric_old` called on `all_m1_transcripts` / `all_m2_transcripts` pools after `filter_paired_transcripts` in `read_align.rs`. Phase 17.12 (2026-05-01): BySJout disk buffering — `BySJReadMeta` struct + `NamedTempFile` SAM temp file replaces `Vec<AlignmentBatchResults>`; `create_bysj_writer` / `bysj_write_records` / `bysj_read_n_records` helpers in `io/sam.rs`; `tempfile` moved to `[dependencies]`. Phase 17.13 (2026-05-01): 8 integration tests in `tests/alignment_features.rs` — synthetic 20kb genome with planted GT-AG intron; tests cover BAM output, PE alignment, spliced reads, BySJout, GeneCounts, unmapped output, two-pass mode. Phase 12.2 (2026-05-04): SE chimeric Tier 1b soft-clip re-mapping — `detect_from_soft_clips` in `chimeric/detect.rs` re-seeds the primary alignment's soft-clipped bases when `detect_chimeric_old` finds no partner; `adjust_read_positions` helper shifts sub-seq coords into full-read space for right clips; called as Step 3c in `read_align.rs`. Phase 17.10 (2026-05-04): Chimeric Tier 3 — `detect_from_chimeric_residuals` in `chimeric/detect.rs` re-seeds outer uncovered read regions (before donor / after acceptor) of each found chimeric pair; enables 3-way gene-fusion detection; called as Step 3d in `read_align.rs`. See [ROADMAP.md](ROADMAP.md) for detailed phase tracking and [docs-old/](docs-old/) for per-phase development notes. The published Astro Starlight docs site is in [docs/](docs/).
3636

3737
## Source Layout
3838

@@ -159,7 +159,7 @@ Previously listed issues now resolved:
159159
- `ERR12389696.5825571`: now aligns as `XV:80779 121M607028N13M16S` (exact match with STAR). Root cause: rustar-aligner computed a finite intron limit of 589824 when `alignIntronMax=0`, blocking the 607kb intron. STAR uses `>0` guard in `stitchAlignToTranscript.cpp` line 100 — alignIntronMax=0 means no limit. Fix: use `u32::MAX` sentinel in score.rs.
160160
- `ERR12389696.16030539`: MAPQ inflation resolved. Both tools now find two alignments (XV:121224 128M925400N10M12S and XV:598336 128M448288N10M12S), both MAPQ=3. Primary selection differs (tie). MAPQ inflate: 1→0.
161161

162-
See [ROADMAP.md](ROADMAP.md) and [docs/](docs/) for full issue tracking.
162+
See [ROADMAP.md](ROADMAP.md) and [docs-old/](docs-old/) for full issue tracking.
163163

164164
## PE Status (Updated 2026-04-29 — Phase G2)
165165

@@ -176,4 +176,4 @@ See [ROADMAP.md](ROADMAP.md) and [docs/](docs/) for full issue tracking.
176176
- No STARsolo single-cell features — Phase 14 (deferred)
177177
- 4 PE AS diffs (rustar-aligner improvements, not bugs): `.844151` finds VIII:451791 0mm vs STAR's VII:1001391 6mm; `.4972950` finds correct spliced mate2 vs STAR's unspliced. Both cases: STAR's combined-window approach fails to stitch a PE pair at the better location.
178178

179-
See [docs/phase17_features.md](docs/phase17_features.md) for full feature status.
179+
See [docs-old/phase17_features.md](docs-old/phase17_features.md) for full feature status.

CONTRIBUTING.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,23 @@ Small synthetic and yeast test data lives in `test/`. Integration tests in `test
2020

2121
## Project history
2222

23-
rustar-aligner was written as a faithful port of [STAR](https://github.com/alexdobin/STAR) by Alexander Dobin. Up to the initial release, the goal was behavioral parity with STAR — matching its algorithms, thresholds, and output formats as closely as possible. Notes from that development phase are in `docs/dev/`.
23+
rustar-aligner was written as a faithful port of [STAR](https://github.com/alexdobin/STAR) by Alexander Dobin. Up to the initial release, the goal was behavioral parity with STAR — matching its algorithms, thresholds, and output formats as closely as possible. Notes from that development phase are in `docs-old/` (`docs-old/dev/` and the `phase*.md` files).
2424

2525
Future development is not bound by that constraint. Adding STARsolo, new features, or diverging from STAR behavior is entirely welcome.
2626

27+
## Documentation site
28+
29+
The user-facing docs site is an [Astro Starlight](https://starlight.astro.build/) project under `docs/`:
30+
31+
```bash
32+
cd docs
33+
pnpm install
34+
pnpm dev # local dev server
35+
pnpm build # production build into docs/dist/
36+
```
37+
38+
Content lives under `docs/src/content/docs/` as Markdown / MDX files with YAML frontmatter (`title`, `description`). Sidebar order is configured in `docs/astro.config.mjs`. Site-wide design tokens (colours, fonts, graph-paper background, wave dividers) live in `docs/src/styles/custom.css` and can be tuned in one place.
39+
2740
## License
2841

2942
MIT, matching the original STAR license.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# ![rustar-aligner](docs/logo/rustar-logo.svg)
1+
# ![rustar-aligner](docs/src/assets/rustar-logo.svg)
22

33
A Rust reimplementation of [STAR](https://github.com/alexdobin/STAR) (Spliced Transcripts Alignment to a Reference), the widely-used RNA-seq aligner originally written in C++ by Alexander Dobin.
44

0 commit comments

Comments
 (0)