Skip to content

Commit 66a4329

Browse files
authored
Merge pull request #318 from samhorsfield96/docs_update
Docs update
2 parents fa1280e + c8bfc81 commit 66a4329

2 files changed

Lines changed: 25 additions & 1 deletion

File tree

docs/query_assignment.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,11 @@ input folder will contain the updated database containing everything needed.
205205
.. note::
206206
This mode can take longer to run with large numbers of input query genomes,
207207
as it will calculate all :math:`Q^2` query-query distances, rather than
208-
just those found in novel query clusters.
208+
just those found in novel query clusters. Furthermore, you may observe query genomes previously
209+
assigned to novel clusters without ``--update-db`` being assigned to existing clusters when using
210+
this option. This is expected behaviour, and is a manifestation of cluster merging, whereby the comparison
211+
of all database genomes to queries, not just references, enables queries to be assigned to existing clusters.
212+
See :doc:`troubleshooting` for more details.
209213

210214
Visualising results
211215
-------------------

docs/troubleshooting.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,25 @@ as above.
3838

3939
See `issue 194 <https://github.com/bacpop/PopPUNK/issues/194>`__ for more discussion.
4040

41+
I get different cluster assignments for my queries with and without --update-db flag
42+
------------------------------------------------------------------------------------
43+
44+
When using ``poppunk_assign``, you may observe query genomes being assigned to novel clusters when running
45+
without the ``--update-db`` option. However, using ``--update-db`` with the same query genomes may result
46+
in assignment of queries to existing clusters, rather than to novel clusters, making cluster assignments seem inconsistent.
47+
48+
However, this is expected behaviour. When running ``poppunk_assign`` without ``--update-db``, query genomes are compared only to
49+
eachother and to reference sequences in the database to make this step efficient. In this instance, the query genomes may be too diverged
50+
from any existing references to cluster with them, and therefore will be assigned to a novel cluster.
51+
52+
When running ``poppunk_assign`` with ``--update-db``, query genomes are compared to all genomes in the database, not just
53+
references. The larger number of comparisons means that the same queries may cluster with non-reference genomes, which
54+
themselves cluster with a references genome. Therefore, the queries are linked indirectly to a reference genome in the distance
55+
network, meaning they are assigned to an existing cluster, not a novel one as before.
56+
57+
We recommend users who find 'novel' clusters in their datasets when running ``poppunk_assign`` without ``--update-db`` also check
58+
against results with ``--update-db`` to determine whether the clusters are truly novel, or form part of existing clusters in the full database.
59+
4160
Memory/run-time issues
4261
----------------------
4362
Here are some tips based on experiences analysing larger datasets:
@@ -156,3 +175,4 @@ If you want to change cluster names or assign queries to your own cluster defini
156175
you can use the ``--external-clustering`` argument instead.
157176

158177

178+

0 commit comments

Comments
 (0)