Reference/documentation for all new metrics that are added but not discussed in the arXiv paper. In the future we will add the metrics from the paper here as well.
In the following we provide a list of metrics that are included in the SynthEval library. We include the relevant reference to where we found the metric, and a brief description of the metric.
The metrics 'exp_var_diff' and 'comp_angle_diff' are two global utility metrics distinct from the other metrics in the library. They are derived from the principal component analysis (PCA) of the real and synthetic datasets and based on the idea that if two datasets are similar, their projections onto the principal components should be similar. The metrics are calculated as follows:
where the first normalisation factor
Reference:
Rajabinasab, M., Lautrup, A.D. & Zimek, A. (2025). Metrics for Inter-Dataset Similarity with Example Applications in Synthetic Data and Feature Selection Evaluation. In Proceedings of the 2025 SIAM International Conference on Data Mining (SDM) (pp. 527--537). Society for Industrial and Applied Mathematics.
Quantile MSE measures the mean squared error of the 10% percent quantiles of the synthetic data as dictated by the real data. This metric is used to evaluate the distribution of the synthetic data. The metric is calculated as follows:
where
Reference:
Butter, A., Diefenbacher, S., Kasieczka, G., Nachman, B., & Plehn, T. (2021). GANplifying event samples. SciPost Physics, 10(6), 139. 10.21468/SciPostPhys.10.6.139
MMD is a kernel-based distance measure between two distributions (in our case the real and synthetic data sets). It comes in two flavors: the biased V-statistic and the unbiased U-statistic. The biased V-statistic is calculated as follows:
where
where the sums are taken over all pairs of samples, excluding the diagonal terms. Both measures can be negative at finite sample sizes due to variance, so it is clipped at
As a test statistic, MMD can be used to perform a two-sample test to determine if the two distributions are significantly different. In this case, a value higher than some threshold (determined by the distribution of MMD under the null hypothesis) would indicate that the two distributions are significantly different. In the context of synthetic data evaluation, a low MMD value would indicate that the synthetic data is not unreasonably different.
Reference:
Gretton, A., Borgwardt, K.M., Rasch, M.J., Smola, A., Schölkopf, B., & Smola, A. (2012). A Kernel Two-Sample Test. Journal of Machine Learning Research, 13(25), 723–773. http://jmlr.org/papers/v13/gretton12a.html
FIO is a generic metric that measures the overlap in top-k selected features between a model trained on real data and a model trained on synthetic data in predicting the target analysis variable. The metric is calculated as follows:
In the actual implementation we select top-5%, 10%, 25%, and 50% of the features (where possible), and return all successful results. A higher value of FIO indicates that the synthetic data is a good representation of the real data in terms of feature selection. However, for the lowest top-k selections (e.g., 5% or 10%), it should be very close to 1 for a good synthetic data set, while for higher top-k selections (e.g., 25% or 50%) it can be more divergent and still indicate a good synthetic data set.