Predictive State Smoothing (PRESS) is a semi-parametric statistical machine learning algorithm
for regression and classification problems. pypress is using TensorFlow Keras to implement
the predictive learning algorithms proposed in
-
Goerg (2017) Predictive State Smoothing (PRESS): Scalable non-parametric regression for high-dimensional data with variable selection.
See below for details on how PRESS works in a nutshell.
It can be installed directly from github.com using:
pip install git+https://github.com/gmgeorg/pypress.git
PRESS is available as 2 layers that need to be added one after the other; alternatively
there is a PRESS() wrapper feed-forward layer that applies both layers at once.
from sklearn.datasets import load_breast_cancer
import sklearn
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_s = sklearn.preprocessing.robust_scale(X) # See demo.ipynb to properly scale X with train/test split
import tensorflow as tf
from pypress.keras import layers
from pypress.keras import regularizers
mod = tf.keras.Sequential()
# see layers.PRESS() for single layer wrapper
mod.add(layers.PredictiveStateSimplex(
n_states=6,
activity_regularizer=regularizers.Uniform(0.01),
input_dim=X.shape[1]))
mod.add(layers.PredictiveStateMeans(units=1, activation="sigmoid"))
mod.compile(loss="binary_crossentropy",
optimizer=tf.keras.optimizers.Nadam(learning_rate=0.01),
metrics=[tf.keras.metrics.AUC(curve="PR", name="auc_pr")])
mod.summary()
mod.fit(X_s, y, epochs=10, validation_split=0.2)Model: "sequential_12"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
predictive_state_simplex_1 (None, 6) 186
1 (PredictiveStateSimplex)
predictive_state_means_11 ( (None, 1) 6
PredictiveStateMeans)
=================================================================
Total params: 192
Trainable params: 192
Non-trainable params: 0
See also the notebook/demo.ipynb for end to end examples for PRESS regression and classification models.
The figure below, adapted from Goerg (2018), contrasts the architecture of a standard feed-forward Deep Neural Network (DNN) with the Predictive State Smoothing (PRESS) approach.
In typical prediction problems, our goal is to model the conditional distribution
In contrast, PRESS decomposes the predictive distribution into a mixture distribution over predictive states (
Mathematically, this is expressed as:
The second equality holds because the state
The primary strength of this decomposition is that predictive states serve as minimal sufficient statistics for
An important byproduct of this framework is the ability to perform predictive clustering:
- Once the mapping from features (
$X$ ) to the predictive state simplex is learned, observations can be clustered within the state space. - Observations sharing similar predictive states are guaranteed to have similar predictive distributions for
$y$ , providing a principled way to group data based on future outcomes rather than raw input similarity.
While PRESS shares similarities with Mixture Density Networks (MDN), there is a fundamental distinction. In an MDN, the output parameters are often direct functions of the features. In PRESS, the conditional independence of
This project is licensed under the terms of the MIT license. See LICENSE for additional details.

