Skip to content

SynthEval v1.7.0

Choose a tag to compare

@notna07 notna07 released this 22 Apr 12:44
· 4 commits to main since this release
3b2b0a4

The 1.7.0 update introduces the new "rich" console mode, based on the Rich library, and refines the metrics output interface, allowing dynamic switching between modes without exess tech debt. The analysis target variable pattern has been completely overhauled to use a configurable object that handles all labels relevant to downstream task analyses, along with confounders, with finer-grained control. In addition, we introduce a timeout feature that allows long metrics to be interrupted and skipped, we improve error handling and configurability, and we add Maximum Mean Discrepancy, along with Feature Importance Overlap, to the metric library.

Refactor:

  • Changed the format_output method in all metric classes to return a list of tuples suitable for rich console output, replacing previous string-based formatting. This affects the metric template, core metric class, and all metric implementations.
  • Changed the analysis_target_var to an analysis_target object: parsing is handled automatically to retain the simple interface, but the object can also be parsed as an object for superior fine-grained control. See the new section in guides/syntheval_guide.ipynb for a tutorial on how this object can be used.

New Features:

  • Added a console argument for the main class, which allows specifying use of rich (new) or ascii (legacy) formatting for the console print or to turn off entirely. We added a check to automatically switch to ascii from richif in a notebook environment to prevent crashing the terminal.
  • Added a timeout feature based on asyncio, for the main evaluation loop to allow interrupting and skipping of metrics that take too long to complete. By default, timeout is not enabled.
  • Added a plot_figures attribute to metric classes, allowing users to control figure plotting separately from verbosity.
  • Added a corresponding argument enable_plots to the SynthEval class, to control plotting in the main evaluation loop.
  • Added a missing_directive argument to the SynthEval class, so that the user can control if SynthEval should raise a warning, drop rows with missingness, or ignore that there is missingness and carry on. We added a small discussion in the guides/preprocessing.md guide on missingness, for users interested in other solutions.
  • Added the Maximum Mean Discrepancy (MMD) metric, recording both the biased and unbiased versions of the statistic. A detailed explanation and reference are added in guides/metrics_references.md.
  • Added the Feature Importance Overlap (FIO) metric, which checks two properties related to feature selection/ranking tasks. Namely, that the importance values assigned by a predictive model match (mean absolute error and weighted mean absolute error), and that the features recovered in a ranking at 5%, 10%, 25% and 50% of features are the same. The metric can also plot the top feature importance scores. A description of this metric is added in guides/metrics_references.md.
  • With the new analysis_target object, the increased flexibility allowed some important changes in the classification accuracy, auroc difference, attribute disclosure risk, and statistical parity metrics, now accounting for multiple potential labels, and dynamically removing confounders in prediction tasks (this is also considered in the new FIO metric).

Documentation:

  • Minor details were added to the README.md.
  • Major reformatting of the metrics overview in README.md into tables with metric keywords and links to the method documentation, and in the few instances where applicable to the guides/metrics_references.md.
  • Guide codebooks were refreshed with the latest features, new metrics, and nicer printing. guides/syntheval_guide.ipynb now includes a new part on the analysis_target object.
  • New guides/preprocessing.md added to document preprocessing steps.
  • Adjusted a number of the metric docstrings.

Changes:

  • Improved error handling in several metric scripts by raising ValueError instead of printing warnings or passing on failed assertions, ensuring clearer feedback for users and that errors are raised in the active console.
  • Changed default ranking system for the benchmark method, from linear (min-max sum) to summation (flat sum).
  • Changed the default preprocessing for PCA metric from "mean" to "std" for consistency.
  • Changed the default F1 setting in the classification metric from microto weighted, for more saturation-aware behaviour on imbalanced classification problems.
  • Changed the default confidence interval unit for CIO and DWM metrics from sem to std. The new option ci can be used to switch back to the old behaviour if needed.
  • Attribute disclosure metric had the sensitive argument removed; now the sensitive attributes are parsed through the analysis_targetobject.
  • Statistical parity metric similarly had the protected_attribute argument removed, and now also uses the sensitive_vars attribute in the analysis_targetobject. In addition, the statistical parity metric can now evaluate multiple protected attributes.

Bug fixes:

  • Hellinger distance metric had a division by zero error when determining binwidth when the interquartile range was 0: this error is now caught, and handles the binning in the non-monotonous case using variable binwidth.
  • Adding the timeout feature, caused a bunch of warnings from plotting outside of the main thread, on certain versions of matplotlib with tkinter; this is fixed by using the 'Agg' backend for plotting. In addition, the async.io caused a notebook crash on some versions, so we added a catch that replaces direct event-loop calls with a loop-self-helper using coroutines if a loop is already running.
  • Improved type casting to meet NumPy 2.x scalar handling (tests broke because of previously lazy type-handling)
  • Attribute disclosure risk, membership inference attack, and Kolmogorov-Smirnov test metrics would return undefined STD for STD calculation on lists of length 1 (which was not a huge problem, but would throw warnings); we made catches to avoid this.