@@ -38,6 +38,25 @@ as above.
3838
3939See `issue 194 <https://github.com/bacpop/PopPUNK/issues/194 >`__ for more discussion.
4040
41+ I get different cluster assignments for my queries with and without --update-db flag
42+ ------------------------------------------------------------------------------------
43+
44+ When using ``poppunk_assign ``, you may observe query genomes being assigned to novel clusters when running
45+ without the ``--update-db `` option. However, using ``--update-db `` with the same query genomes may result
46+ in assignment of queries to existing clusters, rather than to novel clusters, making cluster assignments seem inconsistent.
47+
48+ However, this is expected behaviour. When running ``poppunk_assign `` without ``--update-db ``, query genomes are compared only to
49+ eachother and to reference sequences in the database to make this step efficient. In this instance, the query genomes may be too diverged
50+ from any existing references to cluster with them, and therefore will be assigned to a novel cluster.
51+
52+ When running ``poppunk_assign `` with ``--update-db ``, query genomes are compared to all genomes in the database, not just
53+ references. The larger number of comparisons means that the same queries may cluster with non-reference genomes, which
54+ themselves cluster with a references genome. Therefore, the queries are linked indirectly to a reference genome in the distance
55+ network, meaning they are assigned to an existing cluster, not a novel one as before.
56+
57+ We recommend users who find 'novel' clusters in their datasets when running ``poppunk_assign `` without ``--update-db `` also check
58+ against results with ``--update-db `` to determine whether the clusters are truly novel, or form part of existing clusters in the full database.
59+
4160Memory/run-time issues
4261----------------------
4362Here are some tips based on experiences analysing larger datasets:
@@ -156,3 +175,4 @@ If you want to change cluster names or assign queries to your own cluster defini
156175you can use the ``--external-clustering `` argument instead.
157176
158177
178+
0 commit comments