Skip to content

Commit 88b3dd9

Browse files
author
Jon Belyeu
committed
update to orographer-based version
1 parent 921ed37 commit 88b3dd9

23 files changed

Lines changed: 5725 additions & 2878 deletions

docs/imgs/paraviewer-graphical.svg

Lines changed: 1 addition & 1 deletion
Loading
225 KB
Loading

docs/user_guide.md

Lines changed: 94 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,116 +1,146 @@
11
# User Guide
22

33
## Installation
4-
### Install with Conda/Mamba
5-
Paraviewer depends on Conda packages, so we recommend installing Paraviewer via mamba as well.
6-
We recommend creating a dedicated mamba environment for Paraviewer:
4+
Paraviewer requires **Python 3.10+**. The generated site uses in-browser visualization via [orographer](https://github.com/PacificBiosciences/Orographer).
5+
6+
We recommend installing from conda, for example via mamba:
77
```bash
8-
mamba create -n paraviewer_env igv pip "python>=3.10"
8+
mamba create -n paraviewer_env pip "python>=3.10" paraviewer
99
mamba activate paraviewer_env
10-
mamba install -y paraviewer -c bioconda
1110
```
1211

13-
### Installation from source
14-
15-
Paraviewer requires conda/mamba installations of pip and IGV even when installing from source.
16-
We recommend creating a dedicated mamba environment for Paraviewer.
17-
18-
Paraviewer can then be cloned and installed from source:
12+
You may also install from source:
1913
```bash
20-
mamba create -n paraviewer_env igv pip "python>=3.10"
14+
mamba create -n paraviewer_env pip "python>=3.10" "orographer>=0.1.0"
2115
mamba activate paraviewer_env
2216
git clone https://github.com/PacificBiosciences/Paraviewer.git
2317
cd Paraviewer
2418
pip install .
2519
```
2620

27-
Or downloaded from the [GitHub Releases](https://github.com/PacificBiosciences/Paraviewer/releases) page, unzipped, and installed:
28-
```bash
29-
wget https://github.com/PacificBiosciences/Paraviewer/archive/refs/tags/v0.1.0.tar.gz
30-
tar -xzvf v0.1.0.tar.gz
31-
cd Paraviewer-0.1.0/
32-
pip install .
33-
```
34-
35-
To support headless HPC environments, Paraviewer on Linux also requires [Xvfb](https://www.x.org/archive/X11R7.7/doc/man/man1/Xvfb.1.xhtml).
36-
3721
## How to run
3822
### Paraviewer command-line arguments
39-
These can be viewed after installation by running `paraviewer -h` in the terminal.
23+
These can be viewed after installation by running `paraviewer create -h` in the terminal.
24+
25+
You must pass **exactly one** of `--paraphase-dir` or `--ptcp-dir`, plus `--outdir` and `--ref`.
26+
4027
```text
41-
ParaViewer v0.1.0
42-
usage: paraviewer [-h] [-v] --outdir OUTDIR [--paraphase-dir PARAPHASE_DIR] [--ptcp-dir PTCP_DIR] [--clobber] --genome {hg19,hg38} [--pedigree PEDIGREE]
43-
[--include-only-regions INCLUDE_ONLY_REGIONS [INCLUDE_ONLY_REGIONS ...]] [--exclude-regions EXCLUDE_REGIONS [EXCLUDE_REGIONS ...]]
44-
[--include-only-samples INCLUDE_ONLY_SAMPLES [INCLUDE_ONLY_SAMPLES ...]] [--exclude-samples EXCLUDE_SAMPLES [EXCLUDE_SAMPLES ...]] [--max-reads-per-haplotype MAX_READS_PER_HAPLOTYPE] [--verbose]
28+
$ paraviewer create --help
29+
30+
ParaViewer v1.0.0
31+
usage: paraviewer create [-h] --outdir OUTDIR [--paraphase-dir PARAPHASE_DIR] [--ptcp-dir PTCP_DIR] --ref REF [--gtf GTF] [--include-only-regions INCLUDE_ONLY_REGIONS [INCLUDE_ONLY_REGIONS ...]]
32+
[--exclude-regions EXCLUDE_REGIONS [EXCLUDE_REGIONS ...]] [--pedigree PEDIGREE] [--include-only-samples INCLUDE_ONLY_SAMPLES [INCLUDE_ONLY_SAMPLES ...]]
33+
[--exclude-samples EXCLUDE_SAMPLES [EXCLUDE_SAMPLES ...]] [--max-reads-per-haplotype MAX_READS_PER_HAPLOTYPE] [--threads THREADS] [--clobber] [--verbose]
34+
35+
Process paraphase or puretarget results and generate interactive HTML viewer with orographer plots.
4536
4637
options:
4738
-h, --help show this help message and exit
48-
-v, --version Installed version (0.1.0)
49-
--outdir OUTDIR Path to output directory - should not already exist (default: None)
39+
40+
Required:
41+
--outdir OUTDIR Path to output directory - should not already exist
5042
--paraphase-dir PARAPHASE_DIR
51-
Path to paraphase result directory. (default: None)
52-
--ptcp-dir PTCP_DIR Path to PureTarget Carrier Panel result directory. (default: None)
53-
--clobber Overwrite output directory if it already exists (default: False)
54-
--genome {hg19,hg38} Desired genome build. Choose between GRCh37/HG19 (hg19) and GRCh38/HG38 (hg38) (default: None)
55-
--pedigree PEDIGREE Path to GATK-format PED file containing pedigree information - unrepresented samples will be excluded. (default: None)
43+
EITHER path to paraphase result directory.
44+
--ptcp-dir PTCP_DIR OR path to PureTarget Carrier Panel result directory.
45+
--ref REF Path to reference FASTA file
46+
47+
Filtering:
5648
--include-only-regions INCLUDE_ONLY_REGIONS [INCLUDE_ONLY_REGIONS ...]
57-
Space-delimited list of region names to include. Regions not specified will be excluded. (default: None)
49+
Region names to include; others excluded.
5850
--exclude-regions EXCLUDE_REGIONS [EXCLUDE_REGIONS ...]
59-
Space-delimited list of region names to exclude. (default: None)
51+
Space-delimited list of region names to exclude.
6052
--include-only-samples INCLUDE_ONLY_SAMPLES [INCLUDE_ONLY_SAMPLES ...]
61-
Space-delimited list of sample IDs to include. Samples not specified will be excluded. (default: None)
53+
Sample IDs to include; others excluded.
6254
--exclude-samples EXCLUDE_SAMPLES [EXCLUDE_SAMPLES ...]
63-
Space-delimited list of sample IDs to exclude. (default: None)
55+
Space-delimited list of sample IDs to exclude.
56+
57+
Annotation:
58+
--gtf GTF Optional path to bgzip+tabix GTF/GFF3 for gene track.
59+
--pedigree PEDIGREE Optional GATK-format PED; unrepresented samples excluded.
60+
61+
Other:
6462
--max-reads-per-haplotype MAX_READS_PER_HAPLOTYPE
65-
Maximum number of reads to show per haplotype. (default: 500)
66-
--verbose Print verbose output for debugging purposes (default: False)
63+
Maximum number of reads to show per haplotype.
64+
--threads THREADS Number of worker processes, up to CPU count (default 1)
65+
--clobber Overwrite output directory if it already exists
66+
--verbose Print verbose output for debugging purposes
6767
```
6868

69+
The **parent directory** of `--outdir` must exist. The output directory itself must not exist unless you pass **`--clobber`**.
70+
71+
Plot intervals come from each region's `phase_region` field in the Paraphase JSON (e.g. `38:chr6:32013300-32046200`). Older Paraphase outputs without `phase_region` are not supported.
72+
6973
### Basic WGS usage
7074
To run Paraviewer on WGS [Paraphase](https://github.com/PacificBiosciences/paraphase) output directory, use the following command:
7175
```bash
72-
paraviewer \
76+
paraviewer create \
7377
--outdir {output directory path} \
7478
--paraphase-dir {paraphase output directory path} \
75-
--genome hg38
79+
--ref {reference fasta}
7680
```
7781

7882
### Basic PTCP usage
7983
To run Paraviewer on PureTarget Carrier Panel data from [PTCP](https://github.com/PacificBiosciences/ptcp) output directory, the command is the same except the PTCP directory argument is named `--ptcp-dir`:
8084
```bash
81-
paraviewer \
85+
paraviewer create \
8286
--outdir {output directory path} \
8387
--ptcp-dir {PTCP output directory path} \
84-
--genome hg38
88+
--ref {reference fasta}
8589
```
8690

8791
### Results
88-
Either of these workflows will generate a new website directory at `{output directory path}`. The site will contain an index.html page which can be opened in browser by:
89-
* Pasting the index.html absolute path into a browser window
90-
* Using the command-line tool open: `open {paraphase_directory_path}/index.html`
91-
* Double-clicking on the index.html icon in the file explorer
92+
Either of these workflows will generate a new website directory at `{output directory path}`. To browse it, you can run the paraviewer deploy command to generate a local server:
93+
```bash
94+
$ paraviewer deploy -h
9295

93-
For help in navigating the site's table view, click the `Show Help` button at the bottom of the in-browser page.
96+
ParaViewer v1.0.0
97+
usage: paraviewer deploy [-h] --outdir OUTDIR [--port PORT]
9498

95-
The site can also be deployed to a server for online access. Note that the site behavior may be slightly different for local-only vs hosted sites, due to differences in security features for those environments.
99+
Start a simple HTTP server to serve generated paraviewer HTML and orographer plots.
96100

97-
### Advanced usage
98-
Paraviewer supports several advanced arguments for experiment customization. These apply equally to WGS or PureTarget Paraviewer invocations.
101+
options:
102+
-h, --help show this help message and exit
103+
--outdir OUTDIR Directory path containing HTML and JSON files to serve
104+
--port PORT Port number to serve on (default: 8000)
105+
```
99106

107+
For the above PTCP usage, this would be:
100108
```bash
101-
paraviewer \
102-
--outdir {output directory path} \
103-
--paraviewer-dir OR --ptcp-dir {PTCP output directory path} \
104-
--genome hg38 \
105-
--pedigree {pedigree file} \ # Used to identify trios. Refer to https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format
106-
--exclude-samples {my_boring_sample1 my_boring_sample2} \ # Input sample IDs that you want to exclude. Space-delimited list.
107-
--include_only_regions {smn rccx} \ # Cap-agnostic space delimited list of regions to include in output. The actual region names that can be included here will depend on which regions are supported in the pipeline (WGS or PTCP)
108-
--include_only_samples {my_fun_sample1 my_fun_sample2} \ # Sample IDs to include. Not compatible with `--exclude-samples`
109-
--exclude_regions {smn} \ # Cap-agnostic space delimited list of regions to exclude in output. Not compatible with `--include-only-samples`
109+
$ paraviewer deploy --outdir {same output directory path as used for the 'create' command}
110+
111+
ParaViewer v1.0.0
112+
Serving plots from: my_dir
113+
Server running at http://localhost:8000/
114+
115+
Press Ctrl+C to stop the server
110116
```
111117

118+
You may then load the `http://localhost:8000/` url in your browser to view.
119+
120+
**Note**: Paraviewer sites also support loading on an external server (such as GitHub Pages) for remote access.
121+
122+
For help in navigating the site's table view, click the `Show Help` button at the bottom of the in-browser page.
123+
124+
### Advanced usage
125+
Paraviewer supports several advanced arguments for experiment customization. These apply equally to WGS or PureTarget Paraviewer usage.
126+
127+
* **Pedigree** — Used to identify trios. See [PED format](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format).
128+
* `--pedigree my_cohort.ped`
129+
130+
**Regions**
131+
132+
* **Include only regions** — Space-delimited region keys to keep in the output. Names are case-insensitive; **each name must appear** in at least one input Paraphase JSON (unknown names are an error). The exact set of included regions will depend on your Paraphase/PTCP run. Example:
133+
* `--include-only-regions smn1 rccx`
134+
* **Exclude regions** — Space-delimited region keys to drop. Each name must exist in the input JSON. The `--include-only-regions` and `--exclude-regions` arguments are mutually exclusive; use only one or the other. Example:
135+
* `--exclude-regions smn1`
136+
137+
**Samples**
138+
139+
* **Include only samples** — Space-delimited sample IDs to keep; others are excluded. If none of the given names match discovered samples, Paraviewer exits with an error. Names that do not match any sample are skipped with a warning. Example:
140+
* `--include-only-samples my_fun_sample1 my_fun_sample2`
141+
* **Exclude samples** — Space-delimited sample IDs to drop. The `--include-only-samples` and `--exclude-samples` arguments are mutually exclusive; use only one or the other. Example:
142+
* `--exclude-samples my_boring_sample1 my_boring_sample2`
143+
112144
## Algorithm notes
113145
Paraviewer follows this graphically described path to generate review sites:
114-
<h1 align="center"><img width="100%" style="background-color:white;" src="imgs/paraviewer-graphical.svg"/></h1>
115-
116-
Running IGV in headless batch mode on Linux (necessary to support Linux HPC environments) creates small differences in how images are rendered. This is expected behavior and primarily results in differences in image dimensions vs running on MacOs.
146+
<h1 align="center"><img width="100%" style="background-color:white;" src="imgs/paraviewer-graphical.svg"/></h1>

paraviewer/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
#!/usr/bin/env python
2-
__version__ = "0.1.0"
2+
__version__ = "1.0.0"

0 commit comments

Comments
 (0)