Skip to content

Commit 54e9950

Browse files
committed
slim down README, add workflow examples to docs
1 parent 3b95c3d commit 54e9950

File tree

5 files changed

+118
-134
lines changed

5 files changed

+118
-134
lines changed

README.md

Lines changed: 22 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -2,119 +2,47 @@
22

33
Pangenome-based sequence placement, alignment, and genotyping.
44

5-
Given a pangenome (in [PanMAN](https://github.com/TurakhiaLab/panman) format) and sequencing reads, panmap places reads onto the pangenome tree, aligns them to the closest reference, and calls variants.
5+
[Documentation](https://amkram.github.io/panmap/) | [Preprint](https://www.biorxiv.org/content/10.64898/2026.03.29.711974v1)
66

7-
### Modes
8-
9-
- **Single-sample** (default): Places reads from a single sample, aligns to the best-matching reference, and genotypes variants (BAM + VCF output).
10-
- **Metagenomic** (`--meta`): Scores reads from a mixture sample against every node in the PanMAN, and uses the scoring information to estimate haplotype abundance or directly assign reads to nodes.
11-
12-
13-
### Run with Docker
7+
## Install
148

159
```bash
16-
docker pull alanalohaucsc/panmap:latest
17-
docker run --rm alanalohaucsc/panmap:latest
18-
```
19-
20-
See the [documentation](https://amkram.github.io/panmap/) for building from source.
21-
22-
## Usage
23-
24-
```
25-
panmap <panman> [reads1.fq] [reads2.fq] [options]
10+
conda install -c bioconda panmap
2611
```
2712

28-
### Pipeline stages
29-
30-
By default, panmap runs through genotyping. Use `--stop` to control how far the pipeline runs:
31-
32-
| Stage | Output |
33-
|------------|-------------------------|
34-
| `index` | `.idx` (seed index) |
35-
| `place` | `.placement.tsv` |
36-
| `align` | `.bam` |
37-
| `genotype` | `.vcf` |
38-
39-
### Key options
40-
41-
```
42-
-o, --output <prefix> Output file prefix
43-
-t, --threads <N> Number of threads (default: 1)
44-
-a, --aligner <str> minimap2 (default) or bwa
45-
--stop <stage> Stop after: index, place, align, genotype
46-
--meta Metagenomic mode
47-
-k, --kmer <19> Syncmer k
48-
-s, --syncmer <8> Syncmer s
49-
--refine Alignment-based refinement of top candidates
50-
--force-leaf Restrict placement to leaf nodes
51-
-v, --verbose Verbose output
52-
-q, --quiet Errors only
53-
```
54-
55-
Run `panmap --help` for the full option list.
56-
57-
### Example usage
13+
Or with Docker:
5814

59-
**Place and genotype paired-end reads:**
6015
```bash
61-
panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
16+
docker pull alanalohaucsc/panmap:latest
6217
```
6318

64-
**Metagenomic mode, estimating SARS-CoV-2 lineage abundances:**
65-
66-
First step is to build an index for metagenomics mode:
19+
## Quick start
6720

6821
```bash
69-
mkdir example_run && cd example_run
70-
71-
panmap ../examples/data/sars_20000_twilight_dipper.panman \
72-
--index-mgsr sars_20000_twilight_dipper.idx
73-
```
74-
75-
Then run panmap with the `--meta` option:
22+
# Place and genotype paired-end reads
23+
panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
7624

77-
```bash
78-
panmap ../examples/data/sars_20000_twilight_dipper.panman \
79-
../examples/data/sars20000_5hap_0snp-a_200000_rep0_R1.fastq.gz \
80-
../examples/data/sars20000_5hap_0snp-a_200000_rep0_R2.fastq.gz \
81-
--meta --index sars_20000_twilight_dipper.idx \
82-
--threads 8 --em-delta-threshold 0.00001
25+
# Metagenomic abundance estimation
26+
panmap ref.panman reads.fq --meta --index ref.idx -t 8 -o sample
8327
```
8428

85-
Reads used above were simulated shotgun-sequencing reads of SARS-CoV-2 mixtures. For wastewater samples, refer to README
86-
in [examples/wastewater](examples/wastewater) for more details.
29+
## Pipeline
8730

88-
**Metagenomic mode, filter and assign reads:**
89-
90-
We first build an index for the vertebrate mitochondrial PanMAN. We recommend using the `-k 15 -s 8 -l 1` seed parameters for aeDNA reads.
91-
92-
```bash
93-
mkdir example_run && cd example_run
94-
95-
panmap ../examples/data/v_mtdna.panman \
96-
--index-mgsr v_mtdna.idx -k 15 -s 8 -l 1
9731
```
98-
99-
Then run panmap with the `--filter-and-assign` option:
100-
101-
```bash
102-
panmap ../examples/data/v_mtdna.panman \
103-
../examples/data/subsampled.fastq.gz \
104-
--meta -i v_mtdna.idx \
105-
--filter-and-assign --discard 0.6 --dust 5 \
106-
--taxonomic-metadata ../examples/data/v_mtdna.meta.tsv \
107-
-t 4 --breadth-ratio --output subsampled
32+
index --> place --> align --> genotype
33+
.idx .placement.tsv .bam .vcf
10834
```
10935

110-
This outputs 3 files:
111-
112-
`.mgsr.assignedReads.fastq` file containing the reads that were assigned
36+
By default, panmap stops after placement. Use `--stop` to run further stages.
11337

114-
`.mgsr.assignedReads.out` file containing the number of reads assigned to each node and the indices of the reads assigned, with respect to the the `.mgsr.assignedReads.fastq` file
38+
## Modes
11539

116-
`.mgsr.assignedReadsLCANode.out` file containing the number of reads assigned to the LCA node and the indices of the reads assigned. *As reads may be assigned to multiple nodes, the LCA node of a read is the LCA of all the nodes it was assigned to.*
40+
- **Single-sample** (default): Place reads, align to closest reference, call variants (BAM + VCF)
41+
- **Metagenomic** (`--meta`): Estimate haplotype abundance or assign reads to pangenome nodes
11742

118-
### Building from source
43+
## Links
11944

120-
See the [installation docs](https://amkram.github.io/panmap/installation/) for dependencies and build instructions.
45+
- [Full documentation](https://amkram.github.io/panmap/)
46+
- [Installation options](https://amkram.github.io/panmap/installation/)
47+
- [CLI reference](https://amkram.github.io/panmap/cli-reference/)
48+
- [PanMAN format](https://github.com/TurakhiaLab/panman)

docs/index.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,31 +4,31 @@
44

55
panmap takes sequencing reads and a pangenome in [PanMAN](https://github.com/TurakhiaLab/panman) format, places the reads onto the pangenome tree, aligns them to the closest reference, and calls variants.
66

7-
## Modes
8-
9-
**Single-sample** (default)
10-
: Places reads from a single sample, aligns to the best-matching reference, and genotypes variants. Outputs BAM and VCF files.
11-
12-
**Metagenomic** (`--meta`)
13-
: Scores reads against every node in the PanMAN to estimate haplotype abundance or assign reads directly to nodes.
14-
157
## At a glance
168

179
```bash
1810
# Install
1911
conda install -c bioconda panmap
2012

21-
# Place reads (stops after placement by default)
22-
panmap ref.panman reads_R1.fq reads_R2.fq -t 8 -o sample
23-
24-
# Run full pipeline through genotyping
13+
# Place and genotype paired-end reads
2514
panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
15+
16+
# Metagenomic abundance estimation
17+
panmap ref.panman reads.fq --meta --index ref.idx -t 8 -o sample
2618
```
2719

20+
## Modes
21+
22+
**Single-sample** (default)
23+
: Places reads from a single sample, aligns to the best-matching reference, and genotypes variants. Outputs BAM and VCF files.
24+
25+
**Metagenomic** (`--meta`)
26+
: Scores reads against every node in the PanMAN to estimate haplotype abundance or assign reads directly to nodes.
27+
2828
## Documentation
2929

30-
- [Installation](installation.md) -- Docker and building from source
31-
- [Quick Start](quickstart.md) -- First analysis in minutes
32-
- [Single-Sample Mode](single-sample.md) -- Default pipeline walkthrough
33-
- [Metagenomic Mode](metagenomic.md) -- Abundance estimation and read assignment
30+
- [Installation](installation.md) -- Bioconda, Docker, building from source
31+
- [Quick Start](quickstart.md) -- Pipeline overview and basic examples
32+
- [Single-Sample Mode](single-sample.md) -- Genotyping walkthrough
33+
- [Metagenomic Mode](metagenomic.md) -- Wastewater and aeDNA workflows
3434
- [CLI Reference](cli-reference.md) -- All options and flags

docs/installation.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,15 @@ This installs `panmap` and `panmanUtils`.
1313
## Docker
1414

1515
```bash
16-
docker build -t panmap .
17-
docker run --rm panmap panmap -h
16+
docker pull alanalohaucsc/panmap:latest
17+
docker run --rm alanalohaucsc/panmap:latest --help
18+
```
19+
20+
To run on local files, mount a volume:
21+
22+
```bash
23+
docker run --rm -v $(pwd):/data -w /data alanalohaucsc/panmap:latest \
24+
ref.panman reads.fq -o sample
1825
```
1926

2027
---
@@ -25,7 +32,7 @@ docker run --rm panmap panmap -h
2532

2633
| Package | Ubuntu/Debian |
2734
|---------|---------------|
28-
| CMake >= 3.12 | `cmake` |
35+
| CMake >= 3.14 | `cmake` |
2936
| C++17 compiler | `g++` or `clang++` |
3037
| Protobuf | `protobuf-compiler`, `libprotobuf-dev` |
3138
| Boost | `libboost-program-options-dev`, `libboost-iostreams-dev`, `libboost-filesystem-dev`, `libboost-system-dev`, `libboost-date-time-dev` |
@@ -37,9 +44,8 @@ docker run --rm panmap panmap -h
3744
### Build
3845

3946
```bash
40-
mkdir build && cd build
41-
cmake ..
42-
make -j$(nproc)
47+
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
48+
cmake --build build -j$(nproc)
4349
```
4450

4551
The binary is at `build/bin/panmap`.

docs/quickstart.md

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,43 +8,54 @@ panmap <panman> [reads1.fq] [reads2.fq] [options]
88

99
## Pipeline
1010

11-
panmap runs four stages in sequence. By default, it stops after **placement**. Use `--stop` to run further.
11+
panmap runs four stages in sequence. By default it stops after **placement**. Use `--stop` to run further.
1212

1313
```
14-
index ──> place ──> align ──> genotype
14+
index --> place --> align --> genotype
1515
.idx .placement.tsv .bam .vcf
1616
```
1717

18-
## Example: paired-end genotyping
18+
## Single-sample genotyping
19+
20+
Place reads onto the pangenome, align to the closest reference, and call variants:
1921

2022
```bash
21-
panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
23+
panmap ref.panman reads_R1.fq reads_R2.fq \
24+
--stop genotype -t 8 -o sample
2225
```
2326

24-
This runs the full pipeline and produces:
27+
This produces `sample.bam` and `sample.vcf`.
28+
29+
## Metagenomic abundance estimation
30+
31+
Estimate which lineages are present in a mixed sample:
32+
33+
```bash
34+
# Build metagenomic index (once per pangenome)
35+
panmap ref.panman --index-mgsr ref.idx
36+
37+
# Estimate abundances
38+
panmap ref.panman reads.fq \
39+
--meta --index ref.idx -t 8 -o sample
40+
```
2541

26-
| File | Contents |
27-
|------|----------|
28-
| `sample.idx` | Seed index (reusable) |
29-
| `sample.placement.tsv` | Tree placement |
30-
| `sample.bam` | Aligned reads |
31-
| `sample.vcf` | Called variants |
42+
Output: `sample.mgsr.abundance.out`
3243

33-
## Running partial pipelines
44+
## Partial pipelines
3445

3546
```bash
3647
# Build index only
3748
panmap ref.panman --stop index -o ref
3849

39-
# Place reads (default behavior)
50+
# Place reads (default)
4051
panmap ref.panman reads.fq -o sample
4152

42-
# Place and align, but skip genotyping
53+
# Place and align, skip genotyping
4354
panmap ref.panman reads.fq --stop align -o sample
4455
```
4556

4657
## Next steps
4758

48-
- [Single-Sample Mode](single-sample.md) -- pipeline details and options
49-
- [Metagenomic Mode](metagenomic.md) -- abundance estimation and read assignment
50-
- [CLI Reference](cli-reference.md) -- full option list
59+
- [Single-Sample Mode](single-sample.md) -- full walkthrough with examples
60+
- [Metagenomic Mode](metagenomic.md) -- wastewater and aeDNA workflows
61+
- [CLI Reference](cli-reference.md) -- all options

docs/single-sample.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,45 @@ By default, panmap stops after **placement**. Use `--stop` to control how far th
2626
!!! note
2727
When `--stop genotype` is used, `--force-leaf` is enabled automatically.
2828

29+
## Example: SARS-CoV-2 genotyping from Illumina reads
30+
31+
This example places paired-end reads onto a SARS-CoV-2 pangenome and calls variants.
32+
33+
### 1. Get a PanMAN
34+
35+
Download or build a PanMAN for your organism. For SARS-CoV-2, a pre-built PanMAN with 20,000 samples is included:
36+
37+
```bash
38+
ls examples/data/sars_20000_twilight_dipper.panman
39+
```
40+
41+
### 2. Run the full pipeline
42+
43+
```bash
44+
panmap examples/data/sars_20000_twilight_dipper.panman \
45+
reads_R1.fq.gz reads_R2.fq.gz \
46+
--stop genotype -t 8 -o my_sample
47+
```
48+
49+
### 3. Output files
50+
51+
| File | Contents |
52+
|------|----------|
53+
| `my_sample.idx` | Seed index (reusable for future runs with `--index`) |
54+
| `my_sample.placement.tsv` | Placement on the pangenome tree |
55+
| `my_sample.bam` | Reads aligned to the closest reference |
56+
| `my_sample.vcf` | Called variants |
57+
58+
### 4. Reuse the index
59+
60+
Once built, the index can be reused across samples:
61+
62+
```bash
63+
panmap examples/data/sars_20000_twilight_dipper.panman \
64+
sample2_R1.fq.gz sample2_R2.fq.gz \
65+
--index my_sample.idx --stop genotype -t 8 -o sample2
66+
```
67+
2968
## Common options
3069

3170
| Option | Description | Default |
@@ -54,7 +93,7 @@ panmap ref.panman reads.fq -a bwa -o sample
5493
panmap ref.panman reads.fq --refine -o sample
5594
```
5695

57-
Refinement parameters (advanced):
96+
Refinement parameters:
5897

5998
| Option | Description | Default |
6099
|--------|-------------|---------|

0 commit comments

Comments
 (0)