slim down README, add workflow examples to docs

amkram · amkram · commit 54e9950b3fd4 · 2026-04-10T12:56:22.000-07:00
diff --git a/README.md b/README.md
@@ -2,119 +2,47 @@
 
 Pangenome-based sequence placement, alignment, and genotyping.
 
-Given a pangenome (in [PanMAN](https://github.com/TurakhiaLab/panman) format) and sequencing reads, panmap places reads onto the pangenome tree, aligns them to the closest reference, and calls variants.
+[Documentation](https://amkram.github.io/panmap/) | [Preprint](https://www.biorxiv.org/content/10.64898/2026.03.29.711974v1)
 
-### Modes
-
-- **Single-sample** (default): Places reads from a single sample, aligns to the best-matching reference, and genotypes variants (BAM + VCF output).
-- **Metagenomic** (`--meta`): Scores reads from a mixture sample against every node in the PanMAN, and uses the scoring information to estimate haplotype abundance or directly assign reads to nodes.
-
-
-### Run with Docker
+## Install
 
 ```bash
-docker pull alanalohaucsc/panmap:latest
-docker run --rm alanalohaucsc/panmap:latest
-```
-
-See the [documentation](https://amkram.github.io/panmap/) for building from source.
-
-## Usage
-
-```
-panmap <panman> [reads1.fq] [reads2.fq] [options]
+conda install -c bioconda panmap
 ```
 
-### Pipeline stages
-
-By default, panmap runs through genotyping. Use `--stop` to control how far the pipeline runs:
-
-| Stage      | Output                  |
-|------------|-------------------------|
-| `index`    | `.idx` (seed index)     |
-| `place`    | `.placement.tsv`        |
-| `align`    | `.bam`                  |
-| `genotype` | `.vcf`                  |
-
-### Key options
-
-```
--o, --output <prefix>     Output file prefix
--t, --threads <N>         Number of threads (default: 1)
--a, --aligner <str>       minimap2 (default) or bwa
---stop <stage>            Stop after: index, place, align, genotype
---meta                    Metagenomic mode
--k, --kmer <19>           Syncmer k
--s, --syncmer <8>         Syncmer s
---refine                  Alignment-based refinement of top candidates
---force-leaf              Restrict placement to leaf nodes
--v, --verbose             Verbose output
--q, --quiet               Errors only
-```
-
-Run `panmap --help` for the full option list.
-
-### Example usage
+Or with Docker:
 
-**Place and genotype paired-end reads:**
 ```bash
-panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
+docker pull alanalohaucsc/panmap:latest
 ```
 
-**Metagenomic mode, estimating SARS-CoV-2 lineage abundances:**
-
-First step is to build an index for metagenomics mode:
+## Quick start
 
 ```bash
-mkdir example_run && cd example_run
-
-panmap ../examples/data/sars_20000_twilight_dipper.panman \
-  --index-mgsr sars_20000_twilight_dipper.idx
-```
-
-Then run panmap with the `--meta` option:
+# Place and genotype paired-end reads
+panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
 
-```bash
-panmap ../examples/data/sars_20000_twilight_dipper.panman \
-  ../examples/data/sars20000_5hap_0snp-a_200000_rep0_R1.fastq.gz \
-  ../examples/data/sars20000_5hap_0snp-a_200000_rep0_R2.fastq.gz \
-  --meta --index sars_20000_twilight_dipper.idx \
-  --threads 8 --em-delta-threshold 0.00001
+# Metagenomic abundance estimation
+panmap ref.panman reads.fq --meta --index ref.idx -t 8 -o sample
 ```
 
-Reads used above were simulated shotgun-sequencing reads of SARS-CoV-2 mixtures. For wastewater samples, refer to README
-in [examples/wastewater](examples/wastewater) for more details.
+## Pipeline
 
-**Metagenomic mode, filter and assign reads:**
-
-We first build an index for the vertebrate mitochondrial PanMAN. We recommend using the `-k 15 -s 8 -l 1` seed parameters for aeDNA reads.
-
-```bash
-mkdir example_run && cd example_run
-
-panmap ../examples/data/v_mtdna.panman \
-  --index-mgsr v_mtdna.idx -k 15 -s 8 -l 1
 ```
-
-Then run panmap with the `--filter-and-assign` option:
-
-```bash
-panmap ../examples/data/v_mtdna.panman \
-  ../examples/data/subsampled.fastq.gz \
-  --meta -i v_mtdna.idx \
-  --filter-and-assign --discard 0.6 --dust 5 \
-  --taxonomic-metadata ../examples/data/v_mtdna.meta.tsv \
-  -t 4 --breadth-ratio --output subsampled
+index  -->  place  -->  align  -->  genotype
+ .idx    .placement.tsv  .bam       .vcf
 ```
 
-This outputs 3 files:
-
-`.mgsr.assignedReads.fastq` file containing the reads that were assigned
+By default, panmap stops after placement. Use `--stop` to run further stages.
 
-`.mgsr.assignedReads.out` file containing the number of reads assigned to each node and the indices of the reads assigned, with respect to the the `.mgsr.assignedReads.fastq` file
+## Modes
 
-`.mgsr.assignedReadsLCANode.out` file containing the number of reads assigned to the LCA node and the indices of the reads assigned. *As reads may be assigned to multiple nodes, the LCA node of a read is the LCA of all the nodes it was assigned to.*
+- **Single-sample** (default): Place reads, align to closest reference, call variants (BAM + VCF)
+- **Metagenomic** (`--meta`): Estimate haplotype abundance or assign reads to pangenome nodes
 
-### Building from source
+## Links
 
-See the [installation docs](https://amkram.github.io/panmap/installation/) for dependencies and build instructions.
+- [Full documentation](https://amkram.github.io/panmap/)
+- [Installation options](https://amkram.github.io/panmap/installation/)
+- [CLI reference](https://amkram.github.io/panmap/cli-reference/)
+- [PanMAN format](https://github.com/TurakhiaLab/panman)
diff --git a/docs/index.md b/docs/index.md
@@ -4,31 +4,31 @@
 
 panmap takes sequencing reads and a pangenome in [PanMAN](https://github.com/TurakhiaLab/panman) format, places the reads onto the pangenome tree, aligns them to the closest reference, and calls variants.
 
-## Modes
-
-**Single-sample** (default)
-:   Places reads from a single sample, aligns to the best-matching reference, and genotypes variants. Outputs BAM and VCF files.
-
-**Metagenomic** (`--meta`)
-:   Scores reads against every node in the PanMAN to estimate haplotype abundance or assign reads directly to nodes.
-
 ## At a glance
 
 ```bash
 # Install
 conda install -c bioconda panmap
 
-# Place reads (stops after placement by default)
-panmap ref.panman reads_R1.fq reads_R2.fq -t 8 -o sample
-
-# Run full pipeline through genotyping
+# Place and genotype paired-end reads
 panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
+
+# Metagenomic abundance estimation
+panmap ref.panman reads.fq --meta --index ref.idx -t 8 -o sample
 ```
 
+## Modes
+
+**Single-sample** (default)
+:   Places reads from a single sample, aligns to the best-matching reference, and genotypes variants. Outputs BAM and VCF files.
+
+**Metagenomic** (`--meta`)
+:   Scores reads against every node in the PanMAN to estimate haplotype abundance or assign reads directly to nodes.
+
 ## Documentation
 
-- [Installation](installation.md) -- Docker and building from source
-- [Quick Start](quickstart.md) -- First analysis in minutes
-- [Single-Sample Mode](single-sample.md) -- Default pipeline walkthrough
-- [Metagenomic Mode](metagenomic.md) -- Abundance estimation and read assignment
+- [Installation](installation.md) -- Bioconda, Docker, building from source
+- [Quick Start](quickstart.md) -- Pipeline overview and basic examples
+- [Single-Sample Mode](single-sample.md) -- Genotyping walkthrough
+- [Metagenomic Mode](metagenomic.md) -- Wastewater and aeDNA workflows
 - [CLI Reference](cli-reference.md) -- All options and flags
diff --git a/docs/installation.md b/docs/installation.md
@@ -13,8 +13,15 @@ This installs `panmap` and `panmanUtils`.
 ## Docker
 
 ```bash
-docker build -t panmap .
-docker run --rm panmap panmap -h
+docker pull alanalohaucsc/panmap:latest
+docker run --rm alanalohaucsc/panmap:latest --help
+```
+
+To run on local files, mount a volume:
+
+```bash
+docker run --rm -v $(pwd):/data -w /data alanalohaucsc/panmap:latest \
+  ref.panman reads.fq -o sample
 ```
 
 ---
@@ -25,7 +32,7 @@ docker run --rm panmap panmap -h
 
 | Package | Ubuntu/Debian |
 |---------|---------------|
-| CMake >= 3.12 | `cmake` |
+| CMake >= 3.14 | `cmake` |
 | C++17 compiler | `g++` or `clang++` |
 | Protobuf | `protobuf-compiler`, `libprotobuf-dev` |
 | Boost | `libboost-program-options-dev`, `libboost-iostreams-dev`, `libboost-filesystem-dev`, `libboost-system-dev`, `libboost-date-time-dev` |
@@ -37,9 +44,8 @@ docker run --rm panmap panmap -h
 ### Build
 
 ```bash
-mkdir build && cd build
-cmake ..
-make -j$(nproc)
+cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
+cmake --build build -j$(nproc)
 ```
 
 The binary is at `build/bin/panmap`.
diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -8,43 +8,54 @@ panmap <panman> [reads1.fq] [reads2.fq] [options]
 
 ## Pipeline
 
-panmap runs four stages in sequence. By default, it stops after **placement**. Use `--stop` to run further.
+panmap runs four stages in sequence. By default it stops after **placement**. Use `--stop` to run further.
 
 ```
-index  ──>  place  ──>  align  ──>  genotype
+index  -->  place  -->  align  -->  genotype
  .idx    .placement.tsv  .bam       .vcf
 ```
 
-## Example: paired-end genotyping
+## Single-sample genotyping
+
+Place reads onto the pangenome, align to the closest reference, and call variants:
 
 ```bash
-panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
+panmap ref.panman reads_R1.fq reads_R2.fq \
+  --stop genotype -t 8 -o sample
 ```
 
-This runs the full pipeline and produces:
+This produces `sample.bam` and `sample.vcf`.
+
+## Metagenomic abundance estimation
+
+Estimate which lineages are present in a mixed sample:
+
+```bash
+# Build metagenomic index (once per pangenome)
+panmap ref.panman --index-mgsr ref.idx
+
+# Estimate abundances
+panmap ref.panman reads.fq \
+  --meta --index ref.idx -t 8 -o sample
+```
 
-| File | Contents |
-|------|----------|
-| `sample.idx` | Seed index (reusable) |
-| `sample.placement.tsv` | Tree placement |
-| `sample.bam` | Aligned reads |
-| `sample.vcf` | Called variants |
+Output: `sample.mgsr.abundance.out`
 
-## Running partial pipelines
+## Partial pipelines
 
 ```bash
 # Build index only
 panmap ref.panman --stop index -o ref
 
-# Place reads (default behavior)
+# Place reads (default)
 panmap ref.panman reads.fq -o sample
 
-# Place and align, but skip genotyping
+# Place and align, skip genotyping
 panmap ref.panman reads.fq --stop align -o sample
 ```
 
 ## Next steps
 
-- [Single-Sample Mode](single-sample.md) -- pipeline details and options
-- [Metagenomic Mode](metagenomic.md) -- abundance estimation and read assignment
-- [CLI Reference](cli-reference.md) -- full option list
+- [Single-Sample Mode](single-sample.md) -- full walkthrough with examples
+- [Metagenomic Mode](metagenomic.md) -- wastewater and aeDNA workflows
+- [CLI Reference](cli-reference.md) -- all options
diff --git a/docs/single-sample.md b/docs/single-sample.md
@@ -26,6 +26,45 @@ By default, panmap stops after **placement**. Use `--stop` to control how far th
 !!! note
     When `--stop genotype` is used, `--force-leaf` is enabled automatically.
 
+## Example: SARS-CoV-2 genotyping from Illumina reads
+
+This example places paired-end reads onto a SARS-CoV-2 pangenome and calls variants.
+
+### 1. Get a PanMAN
+
+Download or build a PanMAN for your organism. For SARS-CoV-2, a pre-built PanMAN with 20,000 samples is included:
+
+```bash
+ls examples/data/sars_20000_twilight_dipper.panman
+```
+
+### 2. Run the full pipeline
+
+```bash
+panmap examples/data/sars_20000_twilight_dipper.panman \
+  reads_R1.fq.gz reads_R2.fq.gz \
+  --stop genotype -t 8 -o my_sample
+```
+
+### 3. Output files
+
+| File | Contents |
+|------|----------|
+| `my_sample.idx` | Seed index (reusable for future runs with `--index`) |
+| `my_sample.placement.tsv` | Placement on the pangenome tree |
+| `my_sample.bam` | Reads aligned to the closest reference |
+| `my_sample.vcf` | Called variants |
+
+### 4. Reuse the index
+
+Once built, the index can be reused across samples:
+
+```bash
+panmap examples/data/sars_20000_twilight_dipper.panman \
+  sample2_R1.fq.gz sample2_R2.fq.gz \
+  --index my_sample.idx --stop genotype -t 8 -o sample2
+```
+
 ## Common options
 
 | Option | Description | Default |
@@ -54,7 +93,7 @@ panmap ref.panman reads.fq -a bwa -o sample
 panmap ref.panman reads.fq --refine -o sample
 ```
 
-Refinement parameters (advanced):
+Refinement parameters:
 
 | Option | Description | Default |
 |--------|-------------|---------|