Revamp embeddings clustering

Currently, there are a fixed number of clusters of embeddings identified per partition. We want to:

1. Have the number of clusters be dynamic (use GATE's PCA method to determine the number of clusters)
2. Come up with one-sentence summaries for each cluster, for interpretability. We can probably use an LLM for this.