Open
Description
Summary:
We need a way to generate benchmark test data to ensure consistency and accuracy of pyani
output.
Description:
The initial plan is to take a single genome sequence as input (this may be random...) and an accompanying network representing the input sequence's evolution. Each edge describes a process happening to an input genome, and can be any of several optional processes (with appropriate parameterisation):
- random substitution
- inversion
- gain/loss of sequence from outside the network
- HGT within the network
Starting from the input genome, these processes are applied as intended in the graph.
This will generate a set of input genomes for testing pyani
where we know the evolutionary history of every "leaf node" sequence, and can interpret output accordingly. The data can then be used to benchmark ANI, k-mer and other genome analyses.
pyani Version:
Planned for v0.3+