Question on choosing K and whether to run bp_extract_signatures_iter() for CN signatures (W method)

Hi Shixiang, 


First of all, thank you very much for developing sigminer — it has been very helpful for my copy number signature analysis.

I am currently using copy number signatures with the W method in sigminer, and I would like to ask for your advice on two points:
1. How should I choose K when the aggregated score and the raw metrics do not fully agree?
2. Under what circumstances would you recommend running bp_extract_signatures_iter() after bp_extract_signatures()?

My current setup:
	- Data type: copy number segments
	- Signature method: sig_tally(..., method = "W")
	- K range tested: 2:8
	- I used the best-practice workflow with bp_extract_signatures()

From my current results, `bp_res$suggested` returned 7, but after checking the raw metrics, I am leaning toward K = 6.

However, after checking both the rank score and the raw statistics, I am leaning toward **K = 6** as a more parsimonious solution.

[K_selection_full_survey.pdf](https://github.com/user-attachments/files/26410926/K_selection_full_survey.pdf)

[K_selection_survey2_K6.pdf](https://github.com/user-attachments/files/26410928/K_selection_survey2_K6.pdf)

My interpretation is:

- **K = 7** is suggested because its **aggregated score** is slightly higher than K = 6
- but **K = 6** seems more attractive because:
  - `L2_error` is slightly lower at K = 6 than at K = 7
  - `silhouette` is much better at K = 6 than at K = 7
  - `signature_similarity_within_cluster` is also higher at K = 6
  - `sample_cosine_distance` improves only marginally from K = 6 to K = 7
  - `exposure_positive_correlation` is lower at K = 6 than at K = 7

So my first question is:

### Question 1
In a case like this, would you consider **K = 6** a reasonable final choice, even though `bp_res$suggested` is 7?

More generally, when the **aggregated score** and the **raw metrics** point to slightly different conclusions, how would you recommend prioritizing them in practice?

## About iteration

I also read the documentation for `bp_extract_signatures_iter()`, and my understanding is that it is mainly useful when some samples are not well reconstructed by the initial extraction.

However, in my current CN `W`-method workflow, `get_sig_rec_similarity()` returns the following warning:

`Cannot calculate reconstructed profile without raw W and H for CN 'W'/'M' method.`

So at the moment I am mainly relying on:

- `bp_res$rank_score`
- `bp_get_stats(bp_res)`
- survey plots
- signature/exposure visualizations

I also noticed that the **pairwise correlations among the 6 signature profiles seem relatively high**, which makes me wonder whether:
- K = 6 may still contain partially redundant signatures, or
- iteration would help identify a more stable consensus solution.

[signature_similarity_heatmap.pdf](https://github.com/user-attachments/files/26411100/signature_similarity_heatmap.pdf)

### Question 2
Based on this kind of result, would you recommend running `bp_extract_signatures_iter()` for **CN signatures with the `W` method**?

More specifically, what would be the most practical criteria for deciding whether iteration is needed in this setting?

For example, would you recommend iteration mainly when:
- a subset of samples has clearly poor reconstruction-related statistics,
- the signatures appear partially redundant,
- or when there is evidence of heterogeneous subgroups not captured by the first extraction?

Also, for CN `W` method specifically, is there a preferred way to assess whether iteration is needed, given that `get_sig_rec_similarity()` is not available in this case?

Thanks again for your time and for developing this package.
Sen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on choosing K and whether to run bp_extract_signatures_iter() for CN signatures (W method) #477

Question 1

About iteration

Question 2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question on choosing K and whether to run bp_extract_signatures_iter() for CN signatures (W method) #477

Description

Question 1

About iteration

Question 2

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions