|
1 | 1 | # User Guide |
2 | 2 |
|
3 | 3 | ## Installation |
4 | | -### Install with Conda/Mamba |
5 | | -Paraviewer depends on Conda packages, so we recommend installing Paraviewer via mamba as well. |
6 | | -We recommend creating a dedicated mamba environment for Paraviewer: |
| 4 | +Paraviewer requires **Python 3.10+**. The generated site uses in-browser visualization via [orographer](https://github.com/PacificBiosciences/Orographer). |
| 5 | + |
| 6 | +We recommend installing from conda, for example via mamba: |
7 | 7 | ```bash |
8 | | -mamba create -n paraviewer_env igv pip "python>=3.10" |
| 8 | +mamba create -n paraviewer_env pip "python>=3.10" paraviewer |
9 | 9 | mamba activate paraviewer_env |
10 | | -mamba install -y paraviewer -c bioconda |
11 | 10 | ``` |
12 | 11 |
|
13 | | -### Installation from source |
14 | | - |
15 | | -Paraviewer requires conda/mamba installations of pip and IGV even when installing from source. |
16 | | -We recommend creating a dedicated mamba environment for Paraviewer. |
17 | | - |
18 | | -Paraviewer can then be cloned and installed from source: |
| 12 | +You may also install from source: |
19 | 13 | ```bash |
20 | | -mamba create -n paraviewer_env igv pip "python>=3.10" |
| 14 | +mamba create -n paraviewer_env pip "python>=3.10" "orographer>=0.1.0" |
21 | 15 | mamba activate paraviewer_env |
22 | 16 | git clone https://github.com/PacificBiosciences/Paraviewer.git |
23 | 17 | cd Paraviewer |
24 | 18 | pip install . |
25 | 19 | ``` |
26 | 20 |
|
27 | | -Or downloaded from the [GitHub Releases](https://github.com/PacificBiosciences/Paraviewer/releases) page, unzipped, and installed: |
28 | | -```bash |
29 | | -wget https://github.com/PacificBiosciences/Paraviewer/archive/refs/tags/v0.1.0.tar.gz |
30 | | -tar -xzvf v0.1.0.tar.gz |
31 | | -cd Paraviewer-0.1.0/ |
32 | | -pip install . |
33 | | -``` |
34 | | - |
35 | | -To support headless HPC environments, Paraviewer on Linux also requires [Xvfb](https://www.x.org/archive/X11R7.7/doc/man/man1/Xvfb.1.xhtml). |
36 | | - |
37 | 21 | ## How to run |
38 | 22 | ### Paraviewer command-line arguments |
39 | | -These can be viewed after installation by running `paraviewer -h` in the terminal. |
| 23 | +These can be viewed after installation by running `paraviewer create -h` in the terminal. |
| 24 | + |
| 25 | +You must pass **exactly one** of `--paraphase-dir` or `--ptcp-dir`, plus `--outdir` and `--ref`. |
| 26 | + |
40 | 27 | ```text |
41 | | -ParaViewer v0.1.0 |
42 | | -usage: paraviewer [-h] [-v] --outdir OUTDIR [--paraphase-dir PARAPHASE_DIR] [--ptcp-dir PTCP_DIR] [--clobber] --genome {hg19,hg38} [--pedigree PEDIGREE] |
43 | | - [--include-only-regions INCLUDE_ONLY_REGIONS [INCLUDE_ONLY_REGIONS ...]] [--exclude-regions EXCLUDE_REGIONS [EXCLUDE_REGIONS ...]] |
44 | | - [--include-only-samples INCLUDE_ONLY_SAMPLES [INCLUDE_ONLY_SAMPLES ...]] [--exclude-samples EXCLUDE_SAMPLES [EXCLUDE_SAMPLES ...]] [--max-reads-per-haplotype MAX_READS_PER_HAPLOTYPE] [--verbose] |
| 28 | +$ paraviewer create --help |
| 29 | +
|
| 30 | +ParaViewer v1.0.0 |
| 31 | +usage: paraviewer create [-h] --outdir OUTDIR [--paraphase-dir PARAPHASE_DIR] [--ptcp-dir PTCP_DIR] --ref REF [--gtf GTF] [--include-only-regions INCLUDE_ONLY_REGIONS [INCLUDE_ONLY_REGIONS ...]] |
| 32 | + [--exclude-regions EXCLUDE_REGIONS [EXCLUDE_REGIONS ...]] [--pedigree PEDIGREE] [--include-only-samples INCLUDE_ONLY_SAMPLES [INCLUDE_ONLY_SAMPLES ...]] |
| 33 | + [--exclude-samples EXCLUDE_SAMPLES [EXCLUDE_SAMPLES ...]] [--max-reads-per-haplotype MAX_READS_PER_HAPLOTYPE] [--threads THREADS] [--clobber] [--verbose] |
| 34 | +
|
| 35 | +Process paraphase or puretarget results and generate interactive HTML viewer with orographer plots. |
45 | 36 |
|
46 | 37 | options: |
47 | 38 | -h, --help show this help message and exit |
48 | | - -v, --version Installed version (0.1.0) |
49 | | - --outdir OUTDIR Path to output directory - should not already exist (default: None) |
| 39 | +
|
| 40 | +Required: |
| 41 | + --outdir OUTDIR Path to output directory - should not already exist |
50 | 42 | --paraphase-dir PARAPHASE_DIR |
51 | | - Path to paraphase result directory. (default: None) |
52 | | - --ptcp-dir PTCP_DIR Path to PureTarget Carrier Panel result directory. (default: None) |
53 | | - --clobber Overwrite output directory if it already exists (default: False) |
54 | | - --genome {hg19,hg38} Desired genome build. Choose between GRCh37/HG19 (hg19) and GRCh38/HG38 (hg38) (default: None) |
55 | | - --pedigree PEDIGREE Path to GATK-format PED file containing pedigree information - unrepresented samples will be excluded. (default: None) |
| 43 | + EITHER path to paraphase result directory. |
| 44 | + --ptcp-dir PTCP_DIR OR path to PureTarget Carrier Panel result directory. |
| 45 | + --ref REF Path to reference FASTA file |
| 46 | +
|
| 47 | +Filtering: |
56 | 48 | --include-only-regions INCLUDE_ONLY_REGIONS [INCLUDE_ONLY_REGIONS ...] |
57 | | - Space-delimited list of region names to include. Regions not specified will be excluded. (default: None) |
| 49 | + Region names to include; others excluded. |
58 | 50 | --exclude-regions EXCLUDE_REGIONS [EXCLUDE_REGIONS ...] |
59 | | - Space-delimited list of region names to exclude. (default: None) |
| 51 | + Space-delimited list of region names to exclude. |
60 | 52 | --include-only-samples INCLUDE_ONLY_SAMPLES [INCLUDE_ONLY_SAMPLES ...] |
61 | | - Space-delimited list of sample IDs to include. Samples not specified will be excluded. (default: None) |
| 53 | + Sample IDs to include; others excluded. |
62 | 54 | --exclude-samples EXCLUDE_SAMPLES [EXCLUDE_SAMPLES ...] |
63 | | - Space-delimited list of sample IDs to exclude. (default: None) |
| 55 | + Space-delimited list of sample IDs to exclude. |
| 56 | +
|
| 57 | +Annotation: |
| 58 | + --gtf GTF Optional path to bgzip+tabix GTF/GFF3 for gene track. |
| 59 | + --pedigree PEDIGREE Optional GATK-format PED; unrepresented samples excluded. |
| 60 | +
|
| 61 | +Other: |
64 | 62 | --max-reads-per-haplotype MAX_READS_PER_HAPLOTYPE |
65 | | - Maximum number of reads to show per haplotype. (default: 500) |
66 | | - --verbose Print verbose output for debugging purposes (default: False) |
| 63 | + Maximum number of reads to show per haplotype. |
| 64 | + --threads THREADS Number of worker processes, up to CPU count (default 1) |
| 65 | + --clobber Overwrite output directory if it already exists |
| 66 | + --verbose Print verbose output for debugging purposes |
67 | 67 | ``` |
68 | 68 |
|
| 69 | +The **parent directory** of `--outdir` must exist. The output directory itself must not exist unless you pass **`--clobber`**. |
| 70 | + |
| 71 | +Plot intervals come from each region's `phase_region` field in the Paraphase JSON (e.g. `38:chr6:32013300-32046200`). Older Paraphase outputs without `phase_region` are not supported. |
| 72 | + |
69 | 73 | ### Basic WGS usage |
70 | 74 | To run Paraviewer on WGS [Paraphase](https://github.com/PacificBiosciences/paraphase) output directory, use the following command: |
71 | 75 | ```bash |
72 | | -paraviewer \ |
| 76 | +paraviewer create \ |
73 | 77 | --outdir {output directory path} \ |
74 | 78 | --paraphase-dir {paraphase output directory path} \ |
75 | | - --genome hg38 |
| 79 | + --ref {reference fasta} |
76 | 80 | ``` |
77 | 81 |
|
78 | 82 | ### Basic PTCP usage |
79 | 83 | To run Paraviewer on PureTarget Carrier Panel data from [PTCP](https://github.com/PacificBiosciences/ptcp) output directory, the command is the same except the PTCP directory argument is named `--ptcp-dir`: |
80 | 84 | ```bash |
81 | | -paraviewer \ |
| 85 | +paraviewer create \ |
82 | 86 | --outdir {output directory path} \ |
83 | 87 | --ptcp-dir {PTCP output directory path} \ |
84 | | - --genome hg38 |
| 88 | + --ref {reference fasta} |
85 | 89 | ``` |
86 | 90 |
|
87 | 91 | ### Results |
88 | | -Either of these workflows will generate a new website directory at `{output directory path}`. The site will contain an index.html page which can be opened in browser by: |
89 | | -* Pasting the index.html absolute path into a browser window |
90 | | -* Using the command-line tool open: `open {paraphase_directory_path}/index.html` |
91 | | -* Double-clicking on the index.html icon in the file explorer |
| 92 | +Either of these workflows will generate a new website directory at `{output directory path}`. To browse it, you can run the paraviewer deploy command to generate a local server: |
| 93 | +```bash |
| 94 | +$ paraviewer deploy -h |
92 | 95 |
|
93 | | -For help in navigating the site's table view, click the `Show Help` button at the bottom of the in-browser page. |
| 96 | +ParaViewer v1.0.0 |
| 97 | +usage: paraviewer deploy [-h] --outdir OUTDIR [--port PORT] |
94 | 98 |
|
95 | | -The site can also be deployed to a server for online access. Note that the site behavior may be slightly different for local-only vs hosted sites, due to differences in security features for those environments. |
| 99 | +Start a simple HTTP server to serve generated paraviewer HTML and orographer plots. |
96 | 100 |
|
97 | | -### Advanced usage |
98 | | -Paraviewer supports several advanced arguments for experiment customization. These apply equally to WGS or PureTarget Paraviewer invocations. |
| 101 | +options: |
| 102 | + -h, --help show this help message and exit |
| 103 | + --outdir OUTDIR Directory path containing HTML and JSON files to serve |
| 104 | + --port PORT Port number to serve on (default: 8000) |
| 105 | +``` |
99 | 106 |
|
| 107 | +For the above PTCP usage, this would be: |
100 | 108 | ```bash |
101 | | -paraviewer \ |
102 | | - --outdir {output directory path} \ |
103 | | - --paraviewer-dir OR --ptcp-dir {PTCP output directory path} \ |
104 | | - --genome hg38 \ |
105 | | - --pedigree {pedigree file} \ # Used to identify trios. Refer to https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format |
106 | | - --exclude-samples {my_boring_sample1 my_boring_sample2} \ # Input sample IDs that you want to exclude. Space-delimited list. |
107 | | - --include_only_regions {smn rccx} \ # Cap-agnostic space delimited list of regions to include in output. The actual region names that can be included here will depend on which regions are supported in the pipeline (WGS or PTCP) |
108 | | - --include_only_samples {my_fun_sample1 my_fun_sample2} \ # Sample IDs to include. Not compatible with `--exclude-samples` |
109 | | - --exclude_regions {smn} \ # Cap-agnostic space delimited list of regions to exclude in output. Not compatible with `--include-only-samples` |
| 109 | +$ paraviewer deploy --outdir {same output directory path as used for the 'create' command} |
| 110 | + |
| 111 | +ParaViewer v1.0.0 |
| 112 | +Serving plots from: my_dir |
| 113 | +Server running at http://localhost:8000/ |
| 114 | + |
| 115 | +Press Ctrl+C to stop the server |
110 | 116 | ``` |
111 | 117 |
|
| 118 | +You may then load the `http://localhost:8000/` url in your browser to view. |
| 119 | + |
| 120 | +**Note**: Paraviewer sites also support loading on an external server (such as GitHub Pages) for remote access. |
| 121 | + |
| 122 | +For help in navigating the site's table view, click the `Show Help` button at the bottom of the in-browser page. |
| 123 | + |
| 124 | +### Advanced usage |
| 125 | +Paraviewer supports several advanced arguments for experiment customization. These apply equally to WGS or PureTarget Paraviewer usage. |
| 126 | + |
| 127 | +* **Pedigree** — Used to identify trios. See [PED format](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format). |
| 128 | + * `--pedigree my_cohort.ped` |
| 129 | + |
| 130 | +**Regions** |
| 131 | + |
| 132 | +* **Include only regions** — Space-delimited region keys to keep in the output. Names are case-insensitive; **each name must appear** in at least one input Paraphase JSON (unknown names are an error). The exact set of included regions will depend on your Paraphase/PTCP run. Example: |
| 133 | + * `--include-only-regions smn1 rccx` |
| 134 | +* **Exclude regions** — Space-delimited region keys to drop. Each name must exist in the input JSON. The `--include-only-regions` and `--exclude-regions` arguments are mutually exclusive; use only one or the other. Example: |
| 135 | + * `--exclude-regions smn1` |
| 136 | + |
| 137 | +**Samples** |
| 138 | + |
| 139 | +* **Include only samples** — Space-delimited sample IDs to keep; others are excluded. If none of the given names match discovered samples, Paraviewer exits with an error. Names that do not match any sample are skipped with a warning. Example: |
| 140 | + * `--include-only-samples my_fun_sample1 my_fun_sample2` |
| 141 | +* **Exclude samples** — Space-delimited sample IDs to drop. The `--include-only-samples` and `--exclude-samples` arguments are mutually exclusive; use only one or the other. Example: |
| 142 | + * `--exclude-samples my_boring_sample1 my_boring_sample2` |
| 143 | + |
112 | 144 | ## Algorithm notes |
113 | 145 | Paraviewer follows this graphically described path to generate review sites: |
114 | | -<h1 align="center"><img width="100%" style="background-color:white;" src="imgs/paraviewer-graphical.svg"/></h1> |
115 | | - |
116 | | -Running IGV in headless batch mode on Linux (necessary to support Linux HPC environments) creates small differences in how images are rendered. This is expected behavior and primarily results in differences in image dimensions vs running on MacOs. |
| 146 | +<h1 align="center"><img width="100%" style="background-color:white;" src="imgs/paraviewer-graphical.svg"/></h1> |
0 commit comments