first draft of silhouette_samples that works(according to my test) by Shabasovich · Pull Request #2199 · helmholtz-analytics/heat

Shabasovich · 2026-03-13T10:59:17Z

Description

tried to implement silhouette_samples from scikit-learn

Type of change

new function

Additional

I have a file with explanations containing the logic I tried to implement to calculate a(i) and b(i). I can provide that as well if needed.

brownbaerchen

This is already looking quite good. I had a brief look and highlighted some general things. Mainly using assert statements in the tests.

Since you mentioned you had trouble debugging, I want to advertise the debugger once more. Just write breakpoint() in the code whenever you want to stop and explore. There can be trouble with parallel debugging. So when running in parallel, I only call a breakpoint on rank 0:

if ht.comm.rank == 0:
    breakpoint()

If you're already familiar with this stuff, just ignore this. But I know many python programmers, myself included, who were introduced to this way too late, so it doesn't hurt to mention it :D

brownbaerchen · 2026-03-13T14:20:24Z

heat/cluster/tests/test_metrics.py

We have moved the tests to a different directory in #2172, please move this file there.

brownbaerchen · 2026-03-13T14:22:37Z

heat/cluster/tests/test_metrics.py

+    ht_results_np = ht_results.numpy()
+
+    if np.allclose(sk_results, ht_results_np, atol=1e-5):
+        print("✅ Test Passed: HeAT matches Scikit-Learn results.")
+    else:
+        max_diff = np.max(np.abs(sk_results - ht_results_np))
+        print(f"❌ Test Failed: Results differ. Max diff: {max_diff}")
+        #print(f"sk_results are {np.abs(sk_results)}; ht_results are {np.abs(ht_results_np)}")


Suggested change

ht_results_np = ht_results.numpy()

if np.allclose(sk_results, ht_results_np, atol=1e-5):

print("✅ Test Passed: HeAT matches Scikit-Learn results.")

else:

max_diff = np.max(np.abs(sk_results - ht_results_np))

print(f"❌ Test Failed: Results differ. Max diff: {max_diff}")

#print(f"sk_results are {np.abs(sk_results)}; ht_results are {np.abs(ht_results_np)}")

ht_results_np = ht_results.resplit(None).numpy()

assert np.allclose(sk_results, ht_results, atol=1e-5), f'Max diff between Heat and scipy: np.max(np.abs(sk_results - ht_results_np))'

brownbaerchen · 2026-03-13T14:23:07Z

heat/cluster/tests/test_metrics.py

+    if res_edge[3] == 0:
+        print("✅ Edge Case Passed: Single-sample cluster correctly assigned 0.0")


Suggested change

if res_edge[3] == 0:

print("✅ Edge Case Passed: Single-sample cluster correctly assigned 0.0")

assert res_edge[3] == 0:

brownbaerchen · 2026-03-13T14:23:50Z

heat/cluster/tests/test_metrics.py

+    if res_np[0] > 0.8:
+        print(f"✅ Success! Point 0 is {res_np[0]:.4f}")
+    else:
+        print(f"❌ Failure! Point 0 is {res_np[0]:.4f}")


Suggested change

if res_np[0] > 0.8:

print(f"✅ Success! Point 0 is {res_np[0]:.4f}")

else:

print(f"❌ Failure! Point 0 is {res_np[0]:.4f}")

assert res_np[0] > 0.8, f"Point 0 is {res_np[0]:.4f}"

brownbaerchen · 2026-03-13T14:26:48Z

heat/cluster/metrics.py

+
+
+def silhouette_samples(X, labels, *, metric="euclidean", **kwds):
+    X_distributed = ht.array(X, split=0)


You want the input to be a Heat array that is split according to the user's need. Have a look at heat.sanitation.sanitize_in to make sure the input is a valid heat DNDarray and then don't split.

brownbaerchen · 2026-03-13T14:27:10Z

heat/cluster/metrics.py

+
+def silhouette_samples(X, labels, *, metric="euclidean", **kwds):
+    X_distributed = ht.array(X, split=0)
+    labels_distributed = ht.array(labels, split=0)


Same comment as for X

brownbaerchen · 2026-03-13T14:29:04Z

heat/cluster/metrics.py

+        if X_distributed.dtype.kind == "f":
+            atol = ht.finfo(X_distributed.dtype).eps * 100  # tolerance based on machine accuracy
+
+            if ht.any(ht.abs(diag_elements) > atol):
+                raise error_msg
+        elif ht.any(diag_elements != 0):  # integral dtype
+            raise error_msg


I don't think single precision deserves special treatment here. You can check against the machine eps for any datatype.

for more information, see https://pre-commit.ci

first draft of silhouette_samples that works(according to my test)

b2f009f

github-project-automation bot added this to Roadmap Mar 13, 2026

github-project-automation bot moved this to Todo in Roadmap Mar 13, 2026

github-actions bot added cluster features labels Mar 13, 2026

Shabasovich requested a review from ClaudiaComito March 13, 2026 10:59

Shabasovich self-assigned this Mar 13, 2026

brownbaerchen reviewed Mar 13, 2026

View reviewed changes

Shabasovich and others added 2 commits March 13, 2026 17:20

cleaned up a little bit and added helper functions for silhouette_score

b6496d3

[pre-commit.ci] auto fixes from pre-commit.com hooks

8bcb247

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first draft of silhouette_samples that works(according to my test)#2199

first draft of silhouette_samples that works(according to my test)#2199
Shabasovich wants to merge 3 commits intomainfrom
features/2190-Implement_silhouette_score

Shabasovich commented Mar 13, 2026

Uh oh!

brownbaerchen left a comment

Uh oh!

brownbaerchen Mar 13, 2026

Uh oh!

brownbaerchen Mar 13, 2026

Uh oh!

brownbaerchen Mar 13, 2026

Uh oh!

brownbaerchen Mar 13, 2026 •

edited

Loading

Uh oh!

brownbaerchen Mar 13, 2026

Uh oh!

brownbaerchen Mar 13, 2026

Uh oh!

brownbaerchen Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if res_edge[3] == 0:
		print("✅ Edge Case Passed: Single-sample cluster correctly assigned 0.0")

	if res_edge[3] == 0:
	print("✅ Edge Case Passed: Single-sample cluster correctly assigned 0.0")
	assert res_edge[3] == 0:



		def silhouette_samples(X, labels, , metric="euclidean", *kwds):
		X_distributed = ht.array(X, split=0)

Conversation

Shabasovich commented Mar 13, 2026

Description

Type of change

Additional

Uh oh!

brownbaerchen left a comment

Choose a reason for hiding this comment

Uh oh!

brownbaerchen Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

brownbaerchen Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

brownbaerchen Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

brownbaerchen Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brownbaerchen Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

brownbaerchen Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

brownbaerchen Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brownbaerchen Mar 13, 2026 •

edited

Loading