Skip to content

Commit 290d64f

Browse files
committed
add consensus generation stage (bcftools consensus)
1 parent 54e9950 commit 290d64f

File tree

8 files changed

+73
-22
lines changed

8 files changed

+73
-22
lines changed

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@ Pangenome-based sequence placement, alignment, and genotyping.
55
[Documentation](https://amkram.github.io/panmap/) | [Preprint](https://www.biorxiv.org/content/10.64898/2026.03.29.711974v1)
66

77
## Install
8-
8+
<!--
99
```bash
1010
conda install -c bioconda panmap
1111
```
1212
13-
Or with Docker:
13+
Or with Docker: -->
1414

1515
```bash
1616
docker pull alanalohaucsc/panmap:latest
@@ -20,17 +20,17 @@ docker pull alanalohaucsc/panmap:latest
2020

2121
```bash
2222
# Place and genotype paired-end reads
23-
panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
23+
panmap ref.panman reads_R1.fq reads_R2.fq -t 8 -o sample
2424

2525
# Metagenomic abundance estimation
26-
panmap ref.panman reads.fq --meta --index ref.idx -t 8 -o sample
26+
panmap ref.panman reads.fq --meta -t 8 -o sample
2727
```
2828

2929
## Pipeline
3030

3131
```
32-
index --> place --> align --> genotype
33-
.idx .placement.tsv .bam .vcf
32+
index --> place --> align --> genotype --> consensus
33+
.idx .placement.tsv .bam .vcf .consensus.fa
3434
```
3535

3636
By default, panmap stops after placement. Use `--stop` to run further stages.

docs/cli-reference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Use `--help` for common options or `--help-all` for the full list.
1616
| `-o, --output` | Output file prefix | derived from reads filename |
1717
| `-t, --threads` | Number of threads | `1` |
1818
| `-a, --aligner` | `minimap2` or `bwa` | `minimap2` |
19-
| `--stop` | Stop after: `index`, `place`, `align`, `genotype` | `place` |
19+
| `--stop` | Stop after: `index`, `place`, `align`, `genotype`, `consensus` | `place` |
2020
| `--meta` | Enable metagenomic mode | off |
2121
| `-v, --verbose` | Verbose output | off |
2222
| `-q, --quiet` | Errors only | off |

docs/quickstart.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,23 +8,23 @@ panmap <panman> [reads1.fq] [reads2.fq] [options]
88

99
## Pipeline
1010

11-
panmap runs four stages in sequence. By default it stops after **placement**. Use `--stop` to run further.
11+
panmap runs five stages in sequence. By default it stops after **placement**. Use `--stop` to run further.
1212

1313
```
14-
index --> place --> align --> genotype
15-
.idx .placement.tsv .bam .vcf
14+
index --> place --> align --> genotype --> consensus
15+
.idx .placement.tsv .bam .vcf .consensus.fa
1616
```
1717

1818
## Single-sample genotyping
1919

20-
Place reads onto the pangenome, align to the closest reference, and call variants:
20+
Place reads onto the pangenome, align to the closest reference, call variants, and generate a consensus:
2121

2222
```bash
2323
panmap ref.panman reads_R1.fq reads_R2.fq \
24-
--stop genotype -t 8 -o sample
24+
--stop consensus -t 8 -o sample
2525
```
2626

27-
This produces `sample.bam` and `sample.vcf`.
27+
This produces `sample.bam`, `sample.vcf`, and `sample.consensus.fa`.
2828

2929
## Metagenomic abundance estimation
3030

docs/single-sample.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ The default mode. Places reads from one sample onto the pangenome tree, aligns t
99
panmap ref.panman reads_R1.fq reads_R2.fq -t 8 -o sample
1010

1111
# Run through genotyping
12-
panmap ref.panman reads_R1.fq reads_R2.fq --stop genotype -t 8 -o sample
12+
panmap ref.panman reads_R1.fq reads_R2.fq --stop consensus -t 8 -o sample
1313
```
1414

1515
## Pipeline stages
@@ -22,6 +22,7 @@ By default, panmap stops after **placement**. Use `--stop` to control how far th
2222
| Place | `place` (default) | `<prefix>.placement.tsv` | Places reads onto the pangenome tree |
2323
| Align | `align` | `<prefix>.bam` | Aligns reads to closest reference |
2424
| Genotype | `genotype` | `<prefix>.vcf` | Calls variants from alignments |
25+
| Consensus | `consensus` | `<prefix>.consensus.fa` | Generates consensus FASTA from variants |
2526

2627
!!! note
2728
When `--stop genotype` is used, `--force-leaf` is enabled automatically.
@@ -54,6 +55,7 @@ panmap examples/data/sars_20000_twilight_dipper.panman \
5455
| `my_sample.placement.tsv` | Placement on the pangenome tree |
5556
| `my_sample.bam` | Reads aligned to the closest reference |
5657
| `my_sample.vcf` | Called variants |
58+
| `my_sample.consensus.fa` | Consensus sequence |
5759

5860
### 4. Reuse the index
5961

@@ -62,7 +64,7 @@ Once built, the index can be reused across samples:
6264
```bash
6365
panmap examples/data/sars_20000_twilight_dipper.panman \
6466
sample2_R1.fq.gz sample2_R2.fq.gz \
65-
--index my_sample.idx --stop genotype -t 8 -o sample2
67+
--index my_sample.idx --stop consensus -t 8 -o sample2
6668
```
6769

6870
## Common options

src/3rdparty/bcftools/bcftools.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,7 @@ extern "C" {
187187

188188
int main_mpileup(int argc, char *argv[]);
189189
int main_vcfcall(int argc, char *argv[]);
190+
int main_consensus(int argc, char *argv[]);
190191

191192
#ifdef __cplusplus
192193
}

src/conversion.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,14 @@ void createVcfWithMutationMatrices(std::string& prefix,
115115
std::remove(rawVcfFile.c_str());
116116
}
117117

118+
void createConsensus(const std::string& vcfFileName,
119+
const std::string& refFileName,
120+
const std::string& consensusFileName) {
121+
const char* args[] = {
122+
"consensus", "-f", refFileName.c_str(), "-o", consensusFileName.c_str(), vcfFileName.c_str()};
123+
run_bcftools_in_fork(main_consensus, 6, const_cast<char**>(args));
124+
}
125+
118126
static uint16_t compute_sam_flags(
119127
bool is_paired, bool is_read1, uint8_t rev, uint8_t mate_rev, uint8_t proper_frag, bool mate_unmapped) {
120128
uint16_t flag = 0;

src/conversion.hpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ void createVcfWithMutationMatrices(std::string& prefix,
3434
std::string& vcfFileName,
3535
const std::vector<std::vector<double>>& substMatrixPhred);
3636

37+
void createConsensus(const std::string& vcfFileName,
38+
const std::string& refFileName,
39+
const std::string& consensusFileName);
40+
3741
// Direct alignment-to-BAM pipeline: parallel minimap2 alignment with direct
3842
// bam1_t construction (no SAM text intermediate). Writes sorted BAM file.
3943
void alignAndWriteBam(std::vector<std::string>& readSequences,

src/main.cpp

Lines changed: 43 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -100,11 +100,12 @@ inline const char* cyan() {
100100
} // namespace color
101101

102102
enum class PipelineStage {
103-
Index, // Build index only
104-
Place, // Placement only
105-
Align, // Placement + Alignment
106-
Genotype, // Placement + Alignment + Genotyping
107-
Full // Full pipeline including assembly
103+
Index, // Build index only
104+
Place, // Placement only
105+
Align, // Placement + Alignment
106+
Genotype, // Placement + Alignment + Genotyping
107+
Consensus, // + Consensus FASTA generation
108+
Full // Full pipeline
108109
};
109110

110111
struct Config {
@@ -1194,6 +1195,7 @@ int runAlignment(const Config& cfg,
11941195
const placement::PlacementResult& placement,
11951196
panmanUtils::Tree* preloadedTree = nullptr);
11961197
int runGenotyping(const Config& cfg);
1198+
int runConsensus(const Config& cfg);
11971199

11981200
int runBatchPlacement(const Config& cfg) {
11991201
// Parse batch file
@@ -1669,11 +1671,27 @@ int runGenotyping(const Config& cfg) {
16691671
return 0;
16701672
}
16711673

1674+
int runConsensus(const Config& cfg) {
1675+
std::string refFileName = cfg.output + ".ref.fa";
1676+
std::string vcfFileName = cfg.output + ".vcf";
1677+
std::string consensusFileName = cfg.output + ".consensus.fa";
1678+
1679+
auto start = std::chrono::high_resolution_clock::now();
1680+
1681+
createConsensus(vcfFileName, refFileName, consensusFileName);
1682+
1683+
auto elapsed =
1684+
std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start);
1685+
1686+
logging::msg("Consensus complete in {}ms -> {}", elapsed.count(), consensusFileName);
1687+
return 0;
1688+
}
1689+
16721690
void printUsage() {
16731691
std::cout << color::bold() << "panmap" << color::reset() << " v" << VERSION << "\n";
16741692
std::cout << "Pangenome-based sequence placement, alignment, and genotyping\n\n";
16751693
std::cout << color::bold() << "Usage:" << color::reset() << " panmap [options] <panman> [reads.fq] [reads2.fq]\n";
1676-
std::cout << color::bold() << "Output:" << color::reset() << " <prefix>.vcf, <prefix>.bam, <prefix>.placement.tsv\n"
1694+
std::cout << color::bold() << "Output:" << color::reset() << " <prefix>.vcf, <prefix>.bam, <prefix>.consensus.fa, ...\n"
16771695
<< " (prefix defaults to reads filename, or use -o)\n\n";
16781696
}
16791697

@@ -1690,7 +1708,7 @@ int main(int argc, char** argv) {
16901708
("version,V", "Show version")
16911709
("output,o", po::value<std::string>(&cfg.output), "Output prefix")
16921710
("threads,t", po::value<int>(&cfg.threads)->default_value(1), "Threads")
1693-
("stop", po::value<std::string>()->default_value("place"), "Stop after: index|place|align|genotype")
1711+
("stop", po::value<std::string>()->default_value("place"), "Stop after: index|place|align|genotype|consensus")
16941712
("meta", po::bool_switch(&cfg.metagenomic), "Metagenomic mode (for more options, see --help-all)")
16951713
("aligner,a", po::value<std::string>(&cfg.aligner)->default_value("minimap2"), "Aligner: minimap2|bwa")
16961714
("verbose,v", po::bool_switch(&cfg.verbose), "Verbose output")
@@ -1860,6 +1878,8 @@ int main(int argc, char** argv) {
18601878
cfg.stopAfter = PipelineStage::Align;
18611879
else if (stopStr == "genotype")
18621880
cfg.stopAfter = PipelineStage::Genotype;
1881+
else if (stopStr == "consensus")
1882+
cfg.stopAfter = PipelineStage::Consensus;
18631883
else {
18641884
output::error("Invalid stage '{}'", stopStr);
18651885
return 1;
@@ -2064,13 +2084,29 @@ int main(int argc, char** argv) {
20642084
return 1;
20652085
}
20662086
if (signals::check_interrupted()) return 130;
2087+
if (cfg.stopAfter == PipelineStage::Genotype) {
2088+
output::done("Variants: " + cfg.output + ".vcf");
2089+
output::info("");
2090+
output::info("{}Next steps:{}", color::dim(), color::reset());
2091+
output::info(" {} View variants: bcftools view {}.vcf | head", color::dim(), cfg.output);
2092+
output::info(" {} Consensus: panmap {} {} --stop consensus", color::dim(), cfg.panman, cfg.reads1);
2093+
return 0;
2094+
}
2095+
2096+
// Stage 5: Consensus
2097+
if (runConsensus(cfg) != 0) {
2098+
output::error("Consensus generation failed");
2099+
return 1;
2100+
}
2101+
if (signals::check_interrupted()) return 130;
20672102

20682103
output::info("");
20692104
output::info("{}Pipeline complete.{}", color::green(), color::reset());
20702105
output::info(" Placement: {}.placement.tsv", cfg.output);
20712106
output::info(" Reference: {}.ref.fa", cfg.output);
20722107
output::info(" Alignment: {}.bam", cfg.output);
20732108
output::info(" Variants: {}.vcf", cfg.output);
2109+
output::info(" Consensus: {}.consensus.fa", cfg.output);
20742110
output::info("");
20752111
output::info("{}Next steps:{}", color::dim(), color::reset());
20762112
output::info(" {} View variants: bcftools view {}.vcf | head", color::dim(), cfg.output);

0 commit comments

Comments
 (0)