-
Notifications
You must be signed in to change notification settings - Fork 0
#5: Add improved outlier detection #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
63c7091
to
fef20cd
Compare
8f84394
to
b48c308
Compare
7d178d2
to
f1aa013
Compare
self.expected_slow_ranks = set([ | ||
1702,1462,1902,1222,1182,1262,1862,1342,1382,1422,1742,1502,1102,1582, | ||
1142,1782,1662,1022,1062,1542,1622,982,1302,1822,1381,1501,1021,1341, | ||
1541,1301,1701,1821,1261,1621,1901,1012,1461,1861,1181,1221,981,1661, | ||
1061,1781,1421,1741,1101,1581,1141,902,1812,1252,1492,1532,1772,1292, | ||
1332,1852,302,382,182,702,1652,1212,582,1172,1452,1692,972,1572,1732, | ||
62,1412 | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to brainstorm a different way to test this so we don't have to hard-code the expected values (that's likely outside the scope of this PR though).
detection/detect_slow_nodes.py
Outdated
threshold = representative_center + 3 * np.std(cluster_to_times[representative_cluster]) | ||
|
||
problematic_clusters = [cluster_id for cluster_id, center in cluster_centers.items() if center > threshold] | ||
return data,clusters,cluster_to_times,cluster_to_ranks,cluster_centers,representative_cluster,representative_center,threshold,problematic_clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of returning a huge tuple like this we should use a class since this is very error prone.
detection/detect_slow_nodes.py
Outdated
|
||
representative_cluster = max(cluster_to_times.items(), key=lambda v: len(v[1]))[0] | ||
representative_center = cluster_centers[representative_cluster] | ||
threshold = representative_center + 3 * np.std(cluster_to_times[representative_cluster]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be make this 3
a parameter instead of a constant used directly in the code?
if representative_cluster_is_slowest: | ||
if representative_center - 3 * np.std(cluster_to_times[representative_cluster]) > slowest_non_representative_center: | ||
print() | ||
print(f" WARNING: Clustering results found most times to be slower than others. No outliers will be detected.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that instead of a warning here, we should stop the job from running by not giving the node list with a bunch of slow nodes.
Fixes: #5