Hi Shixiang,
First of all, thank you very much for developing sigminer — it has been very helpful for my copy number signature analysis.
I am currently using copy number signatures with the W method in sigminer, and I would like to ask for your advice on two points:
- How should I choose K when the aggregated score and the raw metrics do not fully agree?
- Under what circumstances would you recommend running bp_extract_signatures_iter() after bp_extract_signatures()?
My current setup:
- Data type: copy number segments
- Signature method: sig_tally(..., method = "W")
- K range tested: 2:8
- I used the best-practice workflow with bp_extract_signatures()
From my current results, bp_res$suggested returned 7, but after checking the raw metrics, I am leaning toward K = 6.
However, after checking both the rank score and the raw statistics, I am leaning toward K = 6 as a more parsimonious solution.
K_selection_full_survey.pdf
K_selection_survey2_K6.pdf
My interpretation is:
- K = 7 is suggested because its aggregated score is slightly higher than K = 6
- but K = 6 seems more attractive because:
L2_error is slightly lower at K = 6 than at K = 7
silhouette is much better at K = 6 than at K = 7
signature_similarity_within_cluster is also higher at K = 6
sample_cosine_distance improves only marginally from K = 6 to K = 7
exposure_positive_correlation is lower at K = 6 than at K = 7
So my first question is:
Question 1
In a case like this, would you consider K = 6 a reasonable final choice, even though bp_res$suggested is 7?
More generally, when the aggregated score and the raw metrics point to slightly different conclusions, how would you recommend prioritizing them in practice?
About iteration
I also read the documentation for bp_extract_signatures_iter(), and my understanding is that it is mainly useful when some samples are not well reconstructed by the initial extraction.
However, in my current CN W-method workflow, get_sig_rec_similarity() returns the following warning:
Cannot calculate reconstructed profile without raw W and H for CN 'W'/'M' method.
So at the moment I am mainly relying on:
bp_res$rank_score
bp_get_stats(bp_res)
- survey plots
- signature/exposure visualizations
I also noticed that the pairwise correlations among the 6 signature profiles seem relatively high, which makes me wonder whether:
- K = 6 may still contain partially redundant signatures, or
- iteration would help identify a more stable consensus solution.
signature_similarity_heatmap.pdf
Question 2
Based on this kind of result, would you recommend running bp_extract_signatures_iter() for CN signatures with the W method?
More specifically, what would be the most practical criteria for deciding whether iteration is needed in this setting?
For example, would you recommend iteration mainly when:
- a subset of samples has clearly poor reconstruction-related statistics,
- the signatures appear partially redundant,
- or when there is evidence of heterogeneous subgroups not captured by the first extraction?
Also, for CN W method specifically, is there a preferred way to assess whether iteration is needed, given that get_sig_rec_similarity() is not available in this case?
Thanks again for your time and for developing this package.
Sen
Hi Shixiang,
First of all, thank you very much for developing sigminer — it has been very helpful for my copy number signature analysis.
I am currently using copy number signatures with the W method in sigminer, and I would like to ask for your advice on two points:
My current setup:
- Data type: copy number segments
- Signature method: sig_tally(..., method = "W")
- K range tested: 2:8
- I used the best-practice workflow with bp_extract_signatures()
From my current results,
bp_res$suggestedreturned 7, but after checking the raw metrics, I am leaning toward K = 6.However, after checking both the rank score and the raw statistics, I am leaning toward K = 6 as a more parsimonious solution.
K_selection_full_survey.pdf
K_selection_survey2_K6.pdf
My interpretation is:
L2_erroris slightly lower at K = 6 than at K = 7silhouetteis much better at K = 6 than at K = 7signature_similarity_within_clusteris also higher at K = 6sample_cosine_distanceimproves only marginally from K = 6 to K = 7exposure_positive_correlationis lower at K = 6 than at K = 7So my first question is:
Question 1
In a case like this, would you consider K = 6 a reasonable final choice, even though
bp_res$suggestedis 7?More generally, when the aggregated score and the raw metrics point to slightly different conclusions, how would you recommend prioritizing them in practice?
About iteration
I also read the documentation for
bp_extract_signatures_iter(), and my understanding is that it is mainly useful when some samples are not well reconstructed by the initial extraction.However, in my current CN
W-method workflow,get_sig_rec_similarity()returns the following warning:Cannot calculate reconstructed profile without raw W and H for CN 'W'/'M' method.So at the moment I am mainly relying on:
bp_res$rank_scorebp_get_stats(bp_res)I also noticed that the pairwise correlations among the 6 signature profiles seem relatively high, which makes me wonder whether:
signature_similarity_heatmap.pdf
Question 2
Based on this kind of result, would you recommend running
bp_extract_signatures_iter()for CN signatures with theWmethod?More specifically, what would be the most practical criteria for deciding whether iteration is needed in this setting?
For example, would you recommend iteration mainly when:
Also, for CN
Wmethod specifically, is there a preferred way to assess whether iteration is needed, given thatget_sig_rec_similarity()is not available in this case?Thanks again for your time and for developing this package.
Sen