Skip to content

Commit 2324c86

Browse files
committed
Description and measures notation refined
1 parent f80fced commit 2324c86

File tree

5 files changed

+87
-80
lines changed

5 files changed

+87
-80
lines changed

README.md

Lines changed: 33 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Then `g++-5` should be installed and `Makefile` might need to be edited replacin
4747
Execution Options:
4848
```
4949
$ ../xmeasures -h
50-
xmeasures 4.0.1
50+
xmeasures 4.0.2
5151
5252
Extrinsic measures evaluation: Omega Index (a fuzzy version of the Adjusted
5353
Rand Index, identical to the Fuzzy Rand Index) and [mean] F1-score (prob, harm
@@ -89,8 +89,8 @@ Evaluating measures are:
8989
- OI - Omega Index (a fuzzy version of the Adjusted Rand Index, identical to
9090
the Fuzzy Rand Index), which yields the same value as Adjusted Rand Index when
9191
applied to the non-overlapping clusterings.
92-
- F1 - various [mean] F1 measures of the Greatest (Max) Match including the
93-
Average F1-Score (suggested by J. Leskovec) with optional weighting.
92+
- [M]F1 - various [mean] F1 measures of the Greatest (Max) Match including
93+
the Average F1-Score (suggested by J. Leskovec) with optional weighting.
9494
NOTE: There are 3 matching policies available for each kind of F1. The most
9595
representative evaluation is performed by the F1p with combined matching
9696
policy (considers both micro and macro weighting).
@@ -141,21 +141,23 @@ Omega Index:
141141
(default=off)
142142
143143
Mean F1:
144-
-f, --f1[=ENUM] evaluate F1 of the [weighted] average of the
145-
greatest (maximal) match by F1 or partial
144+
-f, --f1[=ENUM] evaluate mean F1 of the [weighted] average of
145+
the greatest (maximal) match by F1 or partial
146146
probability.
147147
NOTE: F1p <= F1h <= F1a, where:
148-
- p (F1p) - Harmonic mean of the [weighted]
149-
average of Partial Probabilities, the most
150-
indicative as satisfies the largest number of
151-
the Formal Constraints (homogeneity,
152-
completeness, rag bag, size/quantity,
153-
balance);
154-
- h (F1h) - Harmonic mean of the [weighted]
155-
average of F1a;
148+
- p (F1p) - Harmonic mean (F1) of two
149+
[weighted] averages of the Partial
150+
Probabilities, the most indicative as
151+
satisfies the largest number of the Formal
152+
Constraints (homogeneity, completeness, rag
153+
bag, size/quantity, balance);
154+
- h (F1h) - Harmonic mean (F1) of two
155+
[weighted] averages of all local F1 (harmonic
156+
means of the Precision and Recall of the best
157+
matches of the clusters);
156158
- a (F1a) - Arithmetic mean (average) of
157-
the [weighted] average of F1a, the least
158-
discriminative and satisfies the lowest
159+
two [weighted] averages of all local F1, the
160+
least discriminative and satisfies the lowest
159161
number of the Formal Constraints.
160162
(possible values="partprob",
161163
"harmonic", "average" default=`partprob')
@@ -171,19 +173,20 @@ Mean F1:
171173
"unweighed", "combined"
172174
default=`weighted')
173175
174-
Clusters Labeling & F1 with Precision and Recall:
176+
Clusters Labeling & F1 evaluation with Precision and Recall:
175177
-l, --label=gt_filename label evaluating clusters with the specified
176178
ground-truth (gt) cluster indices and
177179
evaluate F1 (including Precision and Recall)
178-
of the MATCHED labeled clusters only (without
179-
the probable subclusters).
180+
of the (best) MATCHED labeled clusters only
181+
(without the probable subclusters).
180182
NOTE: If 'sync' option is specified then the
181-
clusters labels file name should be the same
182-
as the node base (if specified) and should be
183-
in the .cnl format. The file name can be
184-
either a separate or an evaluating CNL file,
185-
in the latter case this option should precede
186-
the evaluating filename not repeating it
183+
file name of the clusters labels should be
184+
the same as the node base (if specified) and
185+
should be in the .cnl format. The file name
186+
can be either a separate or an evaluating CNL
187+
file, in the latter case this option should
188+
precede the evaluating filename not repeating
189+
it.
187190
-p, --policy[=ENUM] Labels matching policy:
188191
- p - Partial Probabilities (maximizes
189192
gain)
@@ -207,8 +210,9 @@ Clusters Labeling & F1 with Precision and Recall:
207210
208211
209212
NMI:
210-
-n, --nmi evaluate NMI (Normalized Mutual Information)
211-
(default=off)
213+
-n, --nmi evaluate NMI (Normalized Mutual Information),
214+
applicable only to the non-overlapping
215+
clusters (default=off)
212216
-a, --all evaluate all NMIs using sqrt, avg and min
213217
denominators besides the max one
214218
(default=off)
@@ -217,7 +221,7 @@ NMI:
217221
(default=off)
218222
```
219223

220-
> Empty lines and comments (lines starting with #) in the input file (cnl format) are skipped.
224+
> Empty lines and comments (lines starting with #) in the input file (cnl format) are omitted.
221225
222226
**Examples**
223227
Evaluate harmonic mean of the weighted average of the greatest (maximal) match by partial probabilities (the most discriminative F1-measure) using macro weighting (default as the most frequently used, thought combined weighting is the most indicative one):
@@ -245,9 +249,9 @@ Evaluate combined weighed and unweighted F1h (harmonic mean of the average F1s),
245249
$ ./xmeasures -fh -kc -i clslbs.cll -l labels.cnl clusters.cnl
246250
```
247251

248-
Evaluate extended omega index:
252+
Evaluate extended Omega Index and mean F1h (harmonic mean of the weighted average of the greatest (maximal) match by F1):
249253
```
250-
$ ./xmeasures -ox omega_c4.3-1.cnl omega_c4.3-2.cnl
254+
$ ./xmeasures -ox -fh omega_c4.3-1.cnl omega_c4.3-2.cnl
251255
```
252256

253257
**Note:** Please, [star this project](https://github.com/eXascaleInfolab/xmeasures) if you use it.

args.ggo

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# Configuration file for the automatic generation of the input options parsing
22

33
package "xmeasures"
4-
version "4.0.1"
4+
version "4.0.2"
55
versiontext "Author: (c) Artem Lutov <[email protected]>
66
Sources: https://github.com/eXascaleInfolab/xmeasures
77
"
88

99
purpose "Extrinsic measures evaluation: Omega Index (a fuzzy version of the\
10-
Adjusted Rand Index, identical to the Fuzzy Rand Index) and [mean] F1-score (prob, harm and avg)\
11-
for the overlapping multi-resolution clusterings,\
10+
Adjusted Rand Index, identical to the Fuzzy Rand Index) and [mean] F1-score\
11+
(prob, harm and avg) for the overlapping multi-resolution clusterings,\
1212
and standard NMI for the non-overlapping clustering on a single resolution.\
1313
Unequal node base is allowed in the evaluating clusterings and optionally can\
1414
be synchronized removing nodes from the clusters missed in one of the clusterings (collections)."
@@ -41,8 +41,8 @@ Evaluating measures are:
4141
- OI - Omega Index (a fuzzy version of the Adjusted Rand Index, identical to\
4242
the Fuzzy Rand Index), which yields the same value as Adjusted Rand Index when\
4343
applied to the non-overlapping clusterings.
44-
- F1 - various [mean] F1 measures of the Greatest (Max) Match including the Average\
45-
F1-Score (suggested by J. Leskovec) with optional weighting.
44+
- [M]F1 - various [mean] F1 measures of the Greatest (Max) Match including\
45+
the Average F1-Score (suggested by J. Leskovec) with optional weighting.
4646
NOTE: There are 3 matching policies available for each kind of F1. The most\
4747
representative evaluation is performed by the F1p with combined matching\
4848
policy (considers both micro and macro weighting).
@@ -84,14 +84,15 @@ option "extended" x "evaluate extended Omega Index, which does not excessively
8484
penalize distinctly shared nodes." flag off dependon="omega"
8585

8686
section "Mean F1"
87-
option "f1" f "evaluate F1 of the [weighted] average of the greatest (maximal)\
87+
option "f1" f "evaluate mean F1 of the [weighted] average of the greatest (maximal)\
8888
match by F1 or partial probability.
8989
NOTE: F1p <= F1h <= F1a, where:
90-
- p (F1p) - Harmonic mean of the [weighted] average of Partial Probabilities, the\
91-
most indicative as satisfies the largest number of the Formal Constraints\
90+
- p (F1p) - Harmonic mean (F1) of two [weighted] averages of the Partial Probabilities,\
91+
the most indicative as satisfies the largest number of the Formal Constraints\
9292
(homogeneity, completeness, rag bag, size/quantity, balance);
93-
- h (F1h) - Harmonic mean of the [weighted] average of F1a;
94-
- a (F1a) - Arithmetic mean (average) of the [weighted] average of F1a,\
93+
- h (F1h) - Harmonic mean (F1) of two [weighted] averages of all local F1\
94+
(harmonic means of the Precision and Recall of the best matches of the clusters);
95+
- a (F1a) - Arithmetic mean (average) of two [weighted] averages of all local F1,\
9596
the least discriminative and satisfies the lowest number of the Formal Constraints.
9697
"
9798
values="partprob","harmonic","average" enum default="partprob" argoptional
@@ -103,14 +104,14 @@ option "kind" k "kind of the matching policy:
103104
values ="weighted","unweighed","combined" enum default="weighted" argoptional
104105
dependon="f1"
105106

106-
section "Clusters Labeling & F1 with Precision and Recall"
107+
section "Clusters Labeling & F1 evaluation with Precision and Recall"
107108
option "label" l "label evaluating clusters with the specified ground-truth (gt)\
108-
cluster indices and evaluate F1 (including Precision and Recall) of the MATCHED\
109+
cluster indices and evaluate F1 (including Precision and Recall) of the (best) MATCHED\
109110
labeled clusters only (without the probable subclusters).
110-
NOTE: If 'sync' option is specified then the clusters labels file name should be\
111-
the same as the node base (if specified) and should be in the .cnl format.\
111+
NOTE: If 'sync' option is specified then the file name of the clusters labels\
112+
should be the same as the node base (if specified) and should be in the .cnl format.\
112113
The file name can be either a separate or an evaluating CNL file, in the\
113-
latter case this option should precede the evaluating filename not repeating it"
114+
latter case this option should precede the evaluating filename not repeating it."
114115
string typestr="gt_filename"
115116
option "policy" p "Labels matching policy:
116117
- p - Partial Probabilities (maximizes gain)
@@ -128,7 +129,8 @@ NOTE: If 'sync' option is specified then the reduced collection is outputted to
128129
" string typestr="labels_filename" dependon="label"
129130

130131
section "NMI"
131-
option "nmi" n "evaluate NMI (Normalized Mutual Information)" flag off
132+
option "nmi" n "evaluate NMI (Normalized Mutual Information), applicable only\
133+
to the non-overlapping clusters" flag off
132134
option "all" a "evaluate all NMIs using sqrt, avg and min denominators besides\
133135
the max one" flag off dependon="nmi"
134136
option "ln" e "use ln (exp base) instead of log2 (Shannon entropy, bits)\
@@ -141,6 +143,7 @@ args "--default-optional --unamed-opts=clusterings"
141143

142144

143145
# = Changelog =
146+
# v4.0.2 - Description and output measures notations refined
144147
# v4.0.1 - Aggregated output for multiple measures added
145148
# v4.0.0 - Omega index added and bound to the "-o" argument
146149
# - the former "-o" argument (overlaps) renamed to "-O"

0 commit comments

Comments
 (0)