This library requires at least Python 3.12. Install it from pypi:
pip install fair-forgeor from GitHub:
pip install git+https://github.com/wearepal/fair-forge.gitIf you want to use the neural-network-based methods, you need to add the nn extras:
pip install 'fair-forge[nn]'or
pip install 'fair-forge[nn] @ git+https://github.com/wearepal/fair-forge.git'fair-forge provides two main components: metrics and methods. Besides these, there are various utility functions to help with common tasks and also a few example datasets.
The core data type used in forge-fair is numpy arrays: all the methods and metrics expect numpy arrays as input. If you have data in a different form, it is usually easy to convert it to numpy arrays:
- Pandas: to_numpy()
- Polars: to_numpy()
- PyTorch: numpy()
- TensorFlow: make_ndarray()
There are group-aware metrics and non-group-aware metrics. The non-group-aware metrics are callables with this function signature:
import numpy as np
from numpy.typing import NDArray
type Float = float | np.float16 | np.float32 | np.float64
def tpr(y_true: NDArray[np.int32], y_pred: NDArray[np.int32]) -> Float: ...In other words, a non-group-aware metric accepts two numpy arrays — one with the true labels and one with the predicted labels — and returns a single Float. The API of the non-group-aware metrics is chosen such that any metric from scikit-learn can be used — for example, accuracy.
Group-aware metrics take an additional parameter, the group labels:
def cv(
y_true: NDArray[np.int32],
y_pred: NDArray[np.int32],
*,
groups: NDArray[np.int32],
) -> Float:A very important function is fair_forge.as_group_metric(). It takes in a non-group-aware metric, and turns it into one or more group-aware metrics. This is done by first computing the metric value per group, and these individual metric values are then aggregated in different ways — for example, by taking the minimum or the ratio of the values. Here is how one would construct a robust accuracy metric (minimum accuracy across all groups):
import fair_forge as ff
from sklearn.metrics import accuracy_score
# Construct a metric for the minimum accuracy over all groups
(robust_accuracy,) = ff.as_group_metric(
(accuracy_score,), agg=ff.MetricAgg.MIN
)
# Use it as a group-aware metric
robust_accuracy(y_true=y_true, y_pred=y_pred, groups=groups)The group-aware vs non-group-aware distinction also exists for the methods provided in this library. The non-group-aware methods simply follow the scikit-learn API for an estimator (inheriting from BaseEstimator adds some mixin methods which are needed):
from sklearn.base import BaseEstimator
class Method(BaseEstimator):
def fit(self, X: NDArray[np.float32], y: NDArray[np.int32]) -> Self:
pass
def predict(self, X: NDArray[np.float32]) -> NDArray[np.int32]:
passThe methods can be used like normal scikit-learn estimators.
On the other hand, we have the group-based methods, which take an additional parameter, the group labels:
from sklearn.base import BaseEstimator
class GroupMethod(BaseEstimator):
def fit(self, X: NDArray[np.float32], y: NDArray[np.int32], *, group: NDArray[np.int32]) -> Self:
pass
def predict(self, X: NDArray[np.float32]) -> NDArray[np.int32]:
passThese methods can use the group information at training time to produce fairer models.
Besides methods which output a machine learning model, there are also methods which transform the data. These then have a transform method instead of a predict method:
from sklearn.base import BaseEstimator
class GroupBasedTransform(BaseEstimator):
def fit(
self, X: NDArray[np.float32], y: NDArray[np.int32], *, groups: NDArray[np.int32]
) -> Self:
pass
def transform(self, X: NDArray[np.float32]) -> NDArray[np.float32]:
pass
def fit_transform(
self, X: NDArray[np.float32], y: NDArray[np.int32], *, groups: NDArray[np.int32]
) -> NDArray[np.float32]:
pass(Unfortunately, you have to implement fit_transform manually, because otherwise it will not have the groups parameter.)
Such transformation methods can then be combined with non-group-aware methods with scikit-learn’s Pipeline:
from sklearn import config_context
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
# Pipeline will only forward the `groups` argument if we
# set `enable_metadata_routing` to `True`.
with config_context(enable_metadata_routing=True):
estimator = LinearSVC(random_state=42, max_iter=100)
transform = GroupBasedTransform(random_state=42)
# We need to explicitly request here that the transformation's
# `fit` function gets the `groups` argument.
transform.set_fit_request(groups=True)
pipeline = Pipeline([("transform", transform), ("estimator", estimator)])
# This will call `fit_and_transform` on the Transformation
pipeline.fit(train_x, train_y, groups=train_groups)
preds = pipeline.predict(test_x)fair-forge provides many useful components for running experiments and collecting results:
- example datasets (like Adult)
- train-test splitting
- facilities for running multiple methods and evaluating them with multiple metrics
For more information on this, see the documentation.