Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,85 @@ with $M$ outputs, each training iteration of Adam has a cost of $\mathcal{O}
(M^3N^3)$. Adam permits using $n < N$ random training points during each iteration
(mini-batches) which has a cost of $\mathcal{O}(M^3n^3)<<\mathcal{O}(M^3N^3)$.

## Inequality Constraints

Three complementary mechanisms are available to enforce inequality constraints on
GP predictions. They can be combined freely.

### Link Functions (Output Warping)

A link function $g$ maps physical outputs to an unconstrained latent space.
The GP trains in latent space and predictions are inverted back to physical space,
guaranteeing constraint satisfaction by construction.

| `link_function` | Constraint | Required parameters |
| - | - | - |
| `identity` (default) | none | — |
| `log` | $y > \ell$ | `link_lower_bound` |
| `logit` | $\ell < y < u$ | `link_lower_bound`, `link_upper_bound` |

See [GPLinkFunction.md] for the mathematical details.

+Example — positivity constraint:+

!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_log_link.i
block=Trainers

### Derivative Observations (Monotonicity Constraints)

Virtual derivative observations augment the covariance matrix with cross-covariance
blocks between function values and directional derivatives at specified points.
This enforces monotonicity (or zero-derivative) constraints structurally via
the GP prior.

The augmented covariance matrix takes the block form:

!equation
\mathbf{K}_{\text{aug}} = \begin{bmatrix}
\mathbf{K}_{ff} + \sigma_n^2 \mathbf{I} & \mathbf{K}_{fd} \\
\mathbf{K}_{df} & \mathbf{K}_{dd} + \sigma_d^2 \mathbf{I}
\end{bmatrix}

where $\mathbf{K}_{fd}$ and $\mathbf{K}_{dd}$ contain cross-covariances between
function values and partial derivatives. Only the
[SquaredExponentialCovariance.md] kernel supports derivative covariances. Only
single-output GPs are supported.

The virtual observations are placed at user-specified locations
(`monotone_constraint_points`) and target either zero derivative, positive
derivative (`increasing`), or negative derivative (`decreasing`) at those points.

+Example — increasing in second parameter:+

!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_derivative_obs.i
block=Trainers

### Penalty Constraints (Soft Bounds During Tuning)

Penalty terms are added to the negative log-marginal likelihood (NLML) and its
gradient during Adam hyperparameter optimization. This steers hyperparameter
selection toward configurations that respect lower/upper bounds on predicted means
at specified evaluation points.

The augmented loss is:

!equation
\mathcal{L}_{\text{aug}} = \mathcal{L}_{\text{NLML}} + \lambda \sum_c \left[
\max(0,\, \ell_c - \hat{y}_c)^2 + \max(0,\, \hat{y}_c - u_c)^2
\right]

where $\hat{y}_c$ is the predicted mean at constraint point $c$, and $\lambda$
is `penalty_weight`.

The bounds are specified in physical space and are internally transformed through
the link function and data standardizer. Penalty constraints require the
`tune_parameters` option to be active.

+Example — lower-bound penalty at two points:+

!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_penalty_constraint.i
block=Trainers

!syntax parameters /Trainers/GaussianProcessTrainer

!syntax inputs /Trainers/GaussianProcessTrainer
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# GPLinkFunction

## Overview

`GPLinkFunction` provides output warping (link functions) for [GaussianProcessTrainer.md]
that enforce inequality constraints on GP predictions. A link function $g$ maps
the physical output $y$ to a latent (unconstrained) variable $z = g(y)$. The GP
is trained in latent $z$-space, and predictions are mapped back to physical space
via the inverse $y = g^{-1}(z)$.

This approach is sometimes called *warped Gaussian processes*
[!cite](snelson2004warped) and provides an exact constraint: every prediction
satisfies the bound by construction.

## Available Link Functions

### Identity (default)

No transformation is applied. The GP trains and predicts directly in physical space.

!equation
g(y) = y, \quad g^{-1}(z) = z

Use this (or omit `link_function`) for unconstrained problems.

### Log Link

Enforces a strict lower bound: $y > \ell$.

!equation
g(y) = \ln(y - \ell), \quad g^{-1}(z) = e^{z} + \ell

where $\ell$ is `link_lower_bound` (default 0, giving positivity constraint).

The Jacobian correction $\ln |g'(y)| = -\ln(y - \ell)$ is included in the
negative log-marginal likelihood when hyperparameter tuning is active.

### Logit Link

Enforces both a lower and upper bound: $\ell < y < u$.

!equation
g(y) = \ln\!\left(\frac{y - \ell}{u - y}\right), \quad
g^{-1}(z) = \ell + \frac{u - \ell}{1 + e^{-z}}

where $\ell$ is `link_lower_bound` and $u$ is `link_upper_bound`. Requires
$u > \ell$.

## Uncertainty Propagation

When a link function is active, GP posterior variance $\sigma_z^2$ in latent space
is mapped to physical space via the delta method:

!equation
\sigma_y \approx \left| (g^{-1})'(\mu_z) \right| \, \sigma_z

This is an approximation that is accurate when $\sigma_z$ is small relative to
the curvature of $g^{-1}$.

## Interaction with Standardization

The link transform is applied to training data *before* standardization.
The training pipeline in [GaussianProcessTrainer.md] proceeds as:

1. Collect physical responses $y_i$.
2. Apply link: $z_i = g(y_i)$.
3. Standardize (center/scale) the $z_i$.
4. Train GP in standardized $z$-space.

At prediction time, the GP mean and standard deviation are de-standardized to
$z$-space and then the inverse link is applied.

## Usage

Link functions are configured via parameters on [GaussianProcessTrainer.md]:

!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_log_link.i
block=Trainers

!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_logit_link.i
block=Trainers
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,33 @@ Or by simply calling the optimizer with Adam (stochastic algorithm):

!listing stochastic_tools/src/utils/GaussianProcess.C start=tuneHyperParamsAdam(training_params end=);

### Applying a link function

Before setting up the covariance matrix, training data can be transformed through
a [GPLinkFunction.md] to enforce inequality constraints. The link type and bounds
are configured through [GaussianProcessTrainer.md]:

!listing stochastic_tools/src/trainers/GaussianProcessTrainer.C
start=// ---- Link function ----
end=// Apply link function to training data

The transformed values are standardized and the GP trains entirely in latent space.
At prediction time the inverse link is applied to recover physical outputs.

### Derivative observations

Monotonicity constraints are encoded via virtual derivative observations appended
to the training set. The `GaussianProcess` object stores the virtual parameter
locations and their associated derivative dimensions, and assembles an augmented
covariance matrix at setup time:

!listing stochastic_tools/src/trainers/GaussianProcessTrainer.C
start=// ---- Derivative constraint (virtual observations) ----
end=// ---- Penalty constraints ----

### Penalty constraints

Soft bounds on predicted means at specified evaluation points are enforced by
adding a penalty term to the NLML loss and gradient during Adam optimization.
The penalty is computed inside `getLoss` and `getGradient` in `GaussianProcess.C`.

Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,76 @@ class CovarianceFunctionBase : public MooseObject, public CovarianceInterface
const std::string & hyper_param_name,
unsigned int ind) const;

/**
* Compute the cross-covariance between function values and derivatives:
* K_fd[i,j] = Cov[f(x_i), df(xd_j)/dx'_{dim}] = dK(x_i, xd_j)/dx'_{dim}
* For SE kernel: K_fd[i,j] = K(x_i,xd_j) * (x_{i,dim} - xd_{j,dim}) / ell_dim^2
* Default implementation calls mooseError; override in kernels that support
* derivative observations.
* @param K_fd Output matrix of size (x.rows() x xd.rows())
* @param x Function-value observation locations
* @param xd Derivative observation locations
* @param dim Input dimension for the derivative
*/
virtual void computeCovarianceFD(RealEigenMatrix & K_fd,
const RealEigenMatrix & x,
const RealEigenMatrix & xd,
unsigned int dim) const;

/**
* Compute the cross-covariance between derivatives and function values:
* K_df[i,j] = Cov[df(xd_i)/dx_{dim}, f(xp_j)] = dK(xd_i, xp_j)/dx_{d,dim}
* For SE kernel: K_df[i,j] = K(xd_i,xp_j) * (xp_{j,dim} - xd_{i,dim}) / ell_dim^2
* Note: K_df = K_fd^T when xd and xp are swapped.
* Default implementation calls mooseError.
* @param K_df Output matrix of size (xd.rows() x xp.rows())
* @param xd Derivative observation locations
* @param xp Function-value locations (e.g. test points)
* @param dim Input dimension for the derivative
*/
virtual void computeCovarianceDf(RealEigenMatrix & K_df,
const RealEigenMatrix & xd,
const RealEigenMatrix & xp,
unsigned int dim) const;

/**
* Compute the covariance between two sets of derivative observations:
* K_dd[i,j] = Cov[df(xd_i)/dx_{dim_i}, df(xdp_j)/dx'_{dim_j}]
* = d^2 K(xd_i, xdp_j) / (dx_{dim_i} dx'_{dim_j})
* For SE kernel when dim_i == dim_j:
* = K(xd_i,xdp_j) * [1/ell_k^2 - (xd_{i,k}-xdp_{j,k})^2/ell_k^4]
* For SE kernel when dim_i != dim_j:
* = K(xd_i,xdp_j) * [-(xd_{i,ki}-xdp_{j,ki})*(xd_{i,kj}-xdp_{j,kj})/(ell_ki^2*ell_kj^2)]
* Default implementation calls mooseError.
* @param K_dd Output matrix of size (xd.rows() x xdp.rows())
* @param xd First set of derivative observation locations
* @param xdp Second set of derivative observation locations
* @param dim_i Derivative dimension for the first set
* @param dim_j Derivative dimension for the second set
*/
virtual void computeCovarianceDD(RealEigenMatrix & K_dd,
const RealEigenMatrix & xd,
const RealEigenMatrix & xdp,
unsigned int dim_i,
unsigned int dim_j) const;

/**
* Compute dK_cross/dhp: the derivative of the cross-covariance K(x, xc) with
* respect to a hyperparameter. Used in the penalty constraint gradient.
* For kernels that do not implement this, the default returns a zero matrix
* (penalty loss is computed but no gradient contribution is added).
* @param dKdhp Output matrix of size (x.rows() x xc.rows())
* @param x Training point locations (batch)
* @param xc Constraint point location (1 x d)
* @param hp_name Hyperparameter name (prefixed with covariance object name)
* @param ind Index within vector hyperparameter (0 for scalars)
*/
virtual void computedKdhyper_cross(RealEigenMatrix & dKdhp,
const RealEigenMatrix & x,
const RealEigenMatrix & xc,
const std::string & hp_name,
unsigned int ind) const;

/// Check if a given parameter is tunable
/// @param The name of the hyperparameter
virtual bool isTunable(const std::string & name) const;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,32 @@ class SquaredExponentialCovariance : public CovarianceFunctionBase
const Real sigma_f_squared,
const int ind);

/// Cross-covariance K_fd[i,j] = Cov[f(x_i), df(xd_j)/dx'_{dim}] = dK(x_i,xd_j)/dx'_{dim}
void computeCovarianceFD(RealEigenMatrix & K_fd,
const RealEigenMatrix & x,
const RealEigenMatrix & xd,
unsigned int dim) const override;

/// Cross-covariance K_df[i,j] = Cov[df(xd_i)/dx_{dim}, f(xp_j)] = dK(xd_i,xp_j)/dx_{d,dim}
void computeCovarianceDf(RealEigenMatrix & K_df,
const RealEigenMatrix & xd,
const RealEigenMatrix & xp,
unsigned int dim) const override;

/// Second-derivative covariance K_dd[i,j] = d^2K/(dx_{dim_i} dx'_{dim_j})
void computeCovarianceDD(RealEigenMatrix & K_dd,
const RealEigenMatrix & xd,
const RealEigenMatrix & xdp,
unsigned int dim_i,
unsigned int dim_j) const override;

/// dK_cross(x, xc)/dhp: derivative of cross-covariance w.r.t. hyperparameters for penalty gradient
void computedKdhyper_cross(RealEigenMatrix & dKdhp,
const RealEigenMatrix & x,
const RealEigenMatrix & xc,
const std::string & hp_name,
unsigned int ind) const override;

protected:
/// lengh factor (\ell) for the kernel, in vector form for multiple parameters
const std::vector<Real> & _length_factor;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "CovarianceInterface.h"

#include "GaussianProcess.h"
#include "GPLinkFunction.h"

class GaussianProcessTrainer : public SurrogateTrainer, public CovarianceInterface
{
Expand Down
Loading