first draft of silhouette_samples that works(according to my test)#2199
first draft of silhouette_samples that works(according to my test)#2199Shabasovich wants to merge 3 commits intomainfrom
Conversation
brownbaerchen
left a comment
There was a problem hiding this comment.
This is already looking quite good. I had a brief look and highlighted some general things. Mainly using assert statements in the tests.
Since you mentioned you had trouble debugging, I want to advertise the debugger once more. Just write breakpoint() in the code whenever you want to stop and explore. There can be trouble with parallel debugging. So when running in parallel, I only call a breakpoint on rank 0:
if ht.comm.rank == 0:
breakpoint()
If you're already familiar with this stuff, just ignore this. But I know many python programmers, myself included, who were introduced to this way too late, so it doesn't hurt to mention it :D
There was a problem hiding this comment.
We have moved the tests to a different directory in #2172, please move this file there.
| ht_results_np = ht_results.numpy() | ||
|
|
||
| if np.allclose(sk_results, ht_results_np, atol=1e-5): | ||
| print("✅ Test Passed: HeAT matches Scikit-Learn results.") | ||
| else: | ||
| max_diff = np.max(np.abs(sk_results - ht_results_np)) | ||
| print(f"❌ Test Failed: Results differ. Max diff: {max_diff}") | ||
| #print(f"sk_results are {np.abs(sk_results)}; ht_results are {np.abs(ht_results_np)}") |
There was a problem hiding this comment.
| ht_results_np = ht_results.numpy() | |
| if np.allclose(sk_results, ht_results_np, atol=1e-5): | |
| print("✅ Test Passed: HeAT matches Scikit-Learn results.") | |
| else: | |
| max_diff = np.max(np.abs(sk_results - ht_results_np)) | |
| print(f"❌ Test Failed: Results differ. Max diff: {max_diff}") | |
| #print(f"sk_results are {np.abs(sk_results)}; ht_results are {np.abs(ht_results_np)}") | |
| ht_results_np = ht_results.resplit(None).numpy() | |
| assert np.allclose(sk_results, ht_results, atol=1e-5), f'Max diff between Heat and scipy: np.max(np.abs(sk_results - ht_results_np))' |
| if res_edge[3] == 0: | ||
| print("✅ Edge Case Passed: Single-sample cluster correctly assigned 0.0") |
There was a problem hiding this comment.
| if res_edge[3] == 0: | |
| print("✅ Edge Case Passed: Single-sample cluster correctly assigned 0.0") | |
| assert res_edge[3] == 0: |
| if res_np[0] > 0.8: | ||
| print(f"✅ Success! Point 0 is {res_np[0]:.4f}") | ||
| else: | ||
| print(f"❌ Failure! Point 0 is {res_np[0]:.4f}") |
There was a problem hiding this comment.
| if res_np[0] > 0.8: | |
| print(f"✅ Success! Point 0 is {res_np[0]:.4f}") | |
| else: | |
| print(f"❌ Failure! Point 0 is {res_np[0]:.4f}") | |
| assert res_np[0] > 0.8, f"Point 0 is {res_np[0]:.4f}" |
|
|
||
|
|
||
| def silhouette_samples(X, labels, *, metric="euclidean", **kwds): | ||
| X_distributed = ht.array(X, split=0) |
There was a problem hiding this comment.
You want the input to be a Heat array that is split according to the user's need. Have a look at heat.sanitation.sanitize_in to make sure the input is a valid heat DNDarray and then don't split.
|
|
||
| def silhouette_samples(X, labels, *, metric="euclidean", **kwds): | ||
| X_distributed = ht.array(X, split=0) | ||
| labels_distributed = ht.array(labels, split=0) |
There was a problem hiding this comment.
Same comment as for X
| if X_distributed.dtype.kind == "f": | ||
| atol = ht.finfo(X_distributed.dtype).eps * 100 # tolerance based on machine accuracy | ||
|
|
||
| if ht.any(ht.abs(diag_elements) > atol): | ||
| raise error_msg | ||
| elif ht.any(diag_elements != 0): # integral dtype | ||
| raise error_msg |
There was a problem hiding this comment.
I don't think single precision deserves special treatment here. You can check against the machine eps for any datatype.
Description
Type of change
Additional