You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-14Lines changed: 26 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# xmeasures - Extrinsic Clustering Measures
2
-
Extremely fast evaluation of the extrinsic clustering measures: *various F1 measures (including F1-Score) for overlapping multi-resolution clusterings with unequal node base (and optional node base synchronization)* and standard NMI for non-overlapping clustering on a single resolution.
2
+
Extremely fast evaluation of the extrinsic clustering measures: *various F1 measures (including F1-Score) for overlapping multi-resolution clusterings with unequal node base (and optional node base synchronization)*using various *matching policies (micro, macro and combined weighting)*and standard NMI for non-overlapping clustering on a single resolution.
3
3
`xmeasures` evaluates F1 and NMI for collections of hundreds thousands clusters withing a dozen seconds on an ordinary laptop using a single CPU core. The computational time is O(N) unlike O(N*C) of the existing state of the art implementations, where N is the number of nodes in the network and C is the number of clusters.
4
4
`xmeasures` is one of the utilities designed for the [PyCaBeM](https://github.com/eXascaleInfolab/PyCABeM) clustering benchmark to evaluate clustering of large networks.
5
-
A paper about the implemented F1 measures (F1p is much more discriminative than the standard [Average F1-Score](https://cs.stanford.edu/people/jure/pubs/bigclam-wsdm13.pdf)), [NMI measures](www.jmlr.org/papers/volume11/vinh10a/vinh10a.pdf) and their applicability is being written now and the reference will be specified soon...
5
+
A paper about the implemented F1 measures (F1p is much more indicative and discriminative than the standard [Average F1-Score](https://cs.stanford.edu/people/jure/pubs/bigclam-wsdm13.pdf)), [NMI measures](www.jmlr.org/papers/volume11/vinh10a/vinh10a.pdf) and their applicability is being written now and the reference will be specified soon...
6
6
> Standard NMI is implemented considering overlapping and multi-resolution clustering only to demonstrate non-applicability of the standard NMI for such cases, where it yields unfair results. See [GenConvNMI](https://github.com/eXascaleInfolab/GenConvNMI) for the fair generalized NMI evaluation.
Copy file name to clipboardExpand all lines: autogen/cmdline.c
+25-16Lines changed: 25 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -27,11 +27,11 @@
27
27
28
28
constchar*gengetopt_args_info_purpose="Extrinsic measures evaluation: F1 (prob, harm and score) for overlapping\nmulti-resolution clusterings with possible unequal node base and standard NMI\nfor non-overlapping clustering on a single resolution.";
29
29
30
-
constchar*gengetopt_args_info_usage="Usage: xmeasures [OPTIONS] clustering1 clustering2\n\n clustering - input file, collection of the clusters to be evaluated.";
30
+
constchar*gengetopt_args_info_usage="Usage: xmeasures [OPTIONS] clustering1 clustering2\n\n clustering - input file, collection of the clusters to be evaluated.\n \nExample:\n $ ./xmeasures -fp -kc networks/5K25.cnl tests/5K25_l0.825/5K25_l0.825_796.cnl\n";
31
31
32
32
constchar*gengetopt_args_info_versiontext="";
33
33
34
-
constchar*gengetopt_args_info_description="Extrinsic measures are evaluated, i.e. clustering (collection of clusters) is\ncompared to another clustering, which can be the ground-truth.\nEvaluating measures are:\n - F1 - various F1 measures of the Greatest (Max) Match including the Average\nF1-Score with optional weighting;\n - NMI - Normalized Mutual Information, normalized by either max or also\nsqrt, avg and min information content denominators.\nATTENTION: this is standard NMI, which should be used ONLY for the HARD\npartitioning evaluation (non-overlapping clustering on a single resolution).\nIt penalizes overlapping and multi-resolution structures.\nNOTE: Unequal node base in the clusterings is allowed, it penalizes the match.\nEach cluster should contain unique members, which is verified only in the debug\nmode.\nUse [OvpNMI](https://github.com/eXascaleInfolab/OvpNMI) or\n[GenConvNMI](https://github.com/eXascaleInfolab/GenConvNMI) for NMI evaluation\nin the arbitrary collections (still each cluster should contain unique\nmembers).\n";
34
+
constchar*gengetopt_args_info_description="Extrinsic measures are evaluated, i.e. clustering (collection of clusters) is\ncompared to another clustering, which can be the ground-truth.\nNOTE: Each cluster should contain unique members, which is verified only in the\ndebug mode.\nEvaluating measures are:\n\n - F1 - various F1 measures of the Greatest (Max) Match including the Average\nF1-Score with optional weighting.\n NOTE: There are 3 matching policies available for each kind of F1. The most\nrepresentative evaluation is performed by the F1p with combined matching\npolicy (considers both micro and macro weightings). \n\n - NMI - Normalized Mutual Information, normalized by either max or also\nsqrt, avg and min information content denominators.\nATTENTION: This is standard NMI, which should be used ONLY for the HARD\npartitioning evaluation (non-overlapping clustering on a single resolution).\nIt penalizes overlapping and multi-resolution structures.\nNOTE: Unequal node base in the clusterings is allowed, it penalizes the\nmatch.Use [OvpNMI](https://github.com/eXascaleInfolab/OvpNMI) or\n[GenConvNMI](https://github.com/eXascaleInfolab/GenConvNMI) for NMI evaluation\nin the arbitrary collections (still each cluster should contain unique\nmembers).\n";
" -f, --f1[=ENUM] evaluate F1 of the [weighted] average of the greatest\n (maximal) match by F1 or partial probability.\n NOTE: F1p <= F1h <= F1s, where:\n - p (F1p) - Harmonic mean of the [weighted]\n average of Partial Probabilities, the most\n discriminative and satisfies the largest number of\n the Formal Constraints (homogeneity, completeness,\n rag bag, size/quantity, balance);\n - h (F1h) - Harmonic mean of the [weighted]\n average of F1s;\n - s (F1s) - Arithmetic mean (average) of the\n [weighted] average of F1s, Standard F1-Score, the\n least discriminative and satisfies the lowest\n number of the Formal Constraints.\n (possible values=\"partprob\", \"harmonic\",\n \"standard\" default=`partprob')",
45
-
" -u, --unweighted evaluate simple average of the best matches instead\n of weighted by the cluster size (default=off)",
45
+
" -k, --kind[=ENUM]kind of the matching policy:\n - w - weighted (default)\n - u - unweighed\n - c - combined: F1(w, u)\n (possible values=\"weighted\", \"unweighed\",\n \"combined\" default=`weighted')",
0 commit comments