You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve embeddings comparison demo: better analogies, clustering, and visualizations
- Wrap single words in context sentences for sentence embedding models
- Add explicit analogy candidate mappings for common patterns
- Upgrade k-means to 50 iterations with k-means++ initialization
- Add silhouette score for categorization quality metric
- Zoom radar chart and scatter plot axes to highlight model differences
- Add explanatory text to sidebar panels (leaderboard, performance, tradeoff)
Copy file name to clipboardExpand all lines: demos/embeddings-comparison/index.html
+5-2Lines changed: 5 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -142,7 +142,7 @@ <h3>Semantic Similarity</h3>
142
142
<!-- Analogy Task -->
143
143
<divid="analogy-task" class="task-panel">
144
144
<h3>Word Analogies</h3>
145
-
<p>Test reasoning: "A is to B as C is to ?"</p>
145
+
<p>Test reasoning: "A is to B as C is to ?" Note: These sentence embedding models are optimized for sentences, not individual words. Classic analogies (like Word2Vec's king-man+woman=queen) may not work as reliably. Try different examples to see which relationships these models capture best.</p>
146
146
147
147
<divclass="test-inputs">
148
148
<divclass="input-group">
@@ -189,7 +189,7 @@ <h3>Word Analogies</h3>
189
189
<!-- Categorization Task -->
190
190
<divid="categorization-task" class="task-panel">
191
191
<h3>Topic Categorization</h3>
192
-
<p>Test how well models group similar items using k-means clustering.</p>
192
+
<p>Test how well models group similar items using k-means clustering. The algorithm uses k-means++ initialization for better results. Cluster quality is measured by silhouette score (higher = better separation).</p>
<pclass="panel-desc">Rankings based on the most recent test. Higher scores indicate better semantic understanding for the given task.</p>
361
362
<divid="leaderboard"></div>
362
363
</div>
363
364
364
365
<!-- Performance Radar Chart -->
365
366
<divclass="panel radar-panel">
366
367
<h3>Performance Overview</h3>
368
+
<pclass="panel-desc">Compares models across three dimensions: Quality (similarity/accuracy), Speed (inference time), and Consistency. Axes zoom to highlight differences.</p>
367
369
<divid="radar-chart"></div>
368
370
</div>
369
371
370
372
<!-- Speed vs Quality -->
371
373
<divclass="panel tradeoff-panel">
372
374
<h3>Speed vs Quality</h3>
375
+
<pclass="panel-desc">The fundamental tradeoff: larger models (upper-right) offer better quality but slower inference. Choose based on your latency requirements.</p>
0 commit comments