Skip to content

Question on choosing K and whether to run bp_extract_signatures_iter() for CN signatures (W method) #477

Description

@SenLiBio

Hi Shixiang,

First of all, thank you very much for developing sigminer — it has been very helpful for my copy number signature analysis.

I am currently using copy number signatures with the W method in sigminer, and I would like to ask for your advice on two points:

  1. How should I choose K when the aggregated score and the raw metrics do not fully agree?
  2. Under what circumstances would you recommend running bp_extract_signatures_iter() after bp_extract_signatures()?

My current setup:
- Data type: copy number segments
- Signature method: sig_tally(..., method = "W")
- K range tested: 2:8
- I used the best-practice workflow with bp_extract_signatures()

From my current results, bp_res$suggested returned 7, but after checking the raw metrics, I am leaning toward K = 6.

However, after checking both the rank score and the raw statistics, I am leaning toward K = 6 as a more parsimonious solution.

K_selection_full_survey.pdf

K_selection_survey2_K6.pdf

My interpretation is:

  • K = 7 is suggested because its aggregated score is slightly higher than K = 6
  • but K = 6 seems more attractive because:
    • L2_error is slightly lower at K = 6 than at K = 7
    • silhouette is much better at K = 6 than at K = 7
    • signature_similarity_within_cluster is also higher at K = 6
    • sample_cosine_distance improves only marginally from K = 6 to K = 7
    • exposure_positive_correlation is lower at K = 6 than at K = 7

So my first question is:

Question 1

In a case like this, would you consider K = 6 a reasonable final choice, even though bp_res$suggested is 7?

More generally, when the aggregated score and the raw metrics point to slightly different conclusions, how would you recommend prioritizing them in practice?

About iteration

I also read the documentation for bp_extract_signatures_iter(), and my understanding is that it is mainly useful when some samples are not well reconstructed by the initial extraction.

However, in my current CN W-method workflow, get_sig_rec_similarity() returns the following warning:

Cannot calculate reconstructed profile without raw W and H for CN 'W'/'M' method.

So at the moment I am mainly relying on:

  • bp_res$rank_score
  • bp_get_stats(bp_res)
  • survey plots
  • signature/exposure visualizations

I also noticed that the pairwise correlations among the 6 signature profiles seem relatively high, which makes me wonder whether:

  • K = 6 may still contain partially redundant signatures, or
  • iteration would help identify a more stable consensus solution.

signature_similarity_heatmap.pdf

Question 2

Based on this kind of result, would you recommend running bp_extract_signatures_iter() for CN signatures with the W method?

More specifically, what would be the most practical criteria for deciding whether iteration is needed in this setting?

For example, would you recommend iteration mainly when:

  • a subset of samples has clearly poor reconstruction-related statistics,
  • the signatures appear partially redundant,
  • or when there is evidence of heterogeneous subgroups not captured by the first extraction?

Also, for CN W method specifically, is there a preferred way to assess whether iteration is needed, given that get_sig_rec_similarity() is not available in this case?

Thanks again for your time and for developing this package.
Sen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions