idaholab · dschwen · Apr 27, 2026
@@ -238,6 +238,85 @@ with $M$ outputs, each training iteration of Adam has a cost of $\mathcal{O}
 (M^3N^3)$. Adam permits using $n < N$ random training points during each iteration
 (mini-batches) which has a cost of $\mathcal{O}(M^3n^3)<<\mathcal{O}(M^3N^3)$.
 
+## Inequality Constraints
+
+Three complementary mechanisms are available to enforce inequality constraints on
+GP predictions. They can be combined freely.
+
+### Link Functions (Output Warping)
+
+A link function $g$ maps physical outputs to an unconstrained latent space.
+The GP trains in latent space and predictions are inverted back to physical space,
+guaranteeing constraint satisfaction by construction.
+
+| `link_function` | Constraint | Required parameters |
+| - | - | - |
+| `identity` (default) | none | — |
+| `log` | $y > \ell$ | `link_lower_bound` |
+| `logit` | $\ell < y < u$ | `link_lower_bound`, `link_upper_bound` |
+
+See [GPLinkFunction.md] for the mathematical details.
+
++Example — positivity constraint:+
+
+!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_log_link.i
+         block=Trainers
+
+### Derivative Observations (Monotonicity Constraints)
+
+Virtual derivative observations augment the covariance matrix with cross-covariance
+blocks between function values and directional derivatives at specified points.
+This enforces monotonicity (or zero-derivative) constraints structurally via
+the GP prior.
+
+The augmented covariance matrix takes the block form:
+
+!equation
+\mathbf{K}_{\text{aug}} = \begin{bmatrix}
+\mathbf{K}_{ff} + \sigma_n^2 \mathbf{I} & \mathbf{K}_{fd} \\
+\mathbf{K}_{df}                          & \mathbf{K}_{dd} + \sigma_d^2 \mathbf{I}
+\end{bmatrix}
+
+where $\mathbf{K}_{fd}$ and $\mathbf{K}_{dd}$ contain cross-covariances between
+function values and partial derivatives. Only the
+[SquaredExponentialCovariance.md] kernel supports derivative covariances. Only
+single-output GPs are supported.
+
+The virtual observations are placed at user-specified locations
+(`monotone_constraint_points`) and target either zero derivative, positive
+derivative (`increasing`), or negative derivative (`decreasing`) at those points.
+
++Example — increasing in second parameter:+
+
+!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_derivative_obs.i
+         block=Trainers
+
+### Penalty Constraints (Soft Bounds During Tuning)
+
+Penalty terms are added to the negative log-marginal likelihood (NLML) and its
+gradient during Adam hyperparameter optimization. This steers hyperparameter
+selection toward configurations that respect lower/upper bounds on predicted means
+at specified evaluation points.
+
+The augmented loss is:
+
+!equation
+\mathcal{L}_{\text{aug}} = \mathcal{L}_{\text{NLML}} + \lambda \sum_c \left[
+  \max(0,\, \ell_c - \hat{y}_c)^2 + \max(0,\, \hat{y}_c - u_c)^2
+\right]
+
+where $\hat{y}_c$ is the predicted mean at constraint point $c$, and $\lambda$
+is `penalty_weight`.
+
+The bounds are specified in physical space and are internally transformed through
+the link function and data standardizer. Penalty constraints require the
+`tune_parameters` option to be active.
+
++Example — lower-bound penalty at two points:+
+
+!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_penalty_constraint.i
+         block=Trainers
+
 !syntax parameters /Trainers/GaussianProcessTrainer
 
 !syntax inputs /Trainers/GaussianProcessTrainer

@@ -0,0 +1,81 @@
+# GPLinkFunction
+
+## Overview
+
+`GPLinkFunction` provides output warping (link functions) for [GaussianProcessTrainer.md]
+that enforce inequality constraints on GP predictions. A link function $g$ maps
+the physical output $y$ to a latent (unconstrained) variable $z = g(y)$. The GP
+is trained in latent $z$-space, and predictions are mapped back to physical space
+via the inverse $y = g^{-1}(z)$.
+
+This approach is sometimes called *warped Gaussian processes*
+[!cite](snelson2004warped) and provides an exact constraint: every prediction
+satisfies the bound by construction.
+
+## Available Link Functions
+
+### Identity (default)
+
+No transformation is applied. The GP trains and predicts directly in physical space.
+
+!equation
+g(y) = y, \quad g^{-1}(z) = z
+
+Use this (or omit `link_function`) for unconstrained problems.
+
+### Log Link
+
+Enforces a strict lower bound: $y > \ell$.
+
+!equation
+g(y) = \ln(y - \ell), \quad g^{-1}(z) = e^{z} + \ell
+
+where $\ell$ is `link_lower_bound` (default 0, giving positivity constraint).
+
+The Jacobian correction $\ln |g'(y)| = -\ln(y - \ell)$ is included in the
+negative log-marginal likelihood when hyperparameter tuning is active.
+
+### Logit Link
+
+Enforces both a lower and upper bound: $\ell < y < u$.
+
+!equation
+g(y) = \ln\!\left(\frac{y - \ell}{u - y}\right), \quad
+g^{-1}(z) = \ell + \frac{u - \ell}{1 + e^{-z}}
+
+where $\ell$ is `link_lower_bound` and $u$ is `link_upper_bound`. Requires
+$u > \ell$.
+
+## Uncertainty Propagation
+
+When a link function is active, GP posterior variance $\sigma_z^2$ in latent space
+is mapped to physical space via the delta method:
+
+!equation
+\sigma_y \approx \left| (g^{-1})'(\mu_z) \right| \, \sigma_z
+
+This is an approximation that is accurate when $\sigma_z$ is small relative to
+the curvature of $g^{-1}$.
+
+## Interaction with Standardization
+
+The link transform is applied to training data *before* standardization.
+The training pipeline in [GaussianProcessTrainer.md] proceeds as:
+
+1. Collect physical responses $y_i$.
+2. Apply link: $z_i = g(y_i)$.
+3. Standardize (center/scale) the $z_i$.
+4. Train GP in standardized $z$-space.
+
+At prediction time, the GP mean and standard deviation are de-standardized to
+$z$-space and then the inverse link is applied.
+
+## Usage
+
+Link functions are configured via parameters on [GaussianProcessTrainer.md]:
+
+!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_log_link.i
+         block=Trainers
+
+!listing stochastic_tools/test/tests/surrogates/gaussian_process/GP_logit_link.i
+         block=Trainers
@@ -52,3 +52,33 @@ Or by simply calling the optimizer with Adam (stochastic algorithm):
 
 !listing stochastic_tools/src/utils/GaussianProcess.C start=tuneHyperParamsAdam(training_params end=);
 
+### Applying a link function
+
+Before setting up the covariance matrix, training data can be transformed through
+a [GPLinkFunction.md] to enforce inequality constraints. The link type and bounds
+are configured through [GaussianProcessTrainer.md]:
+
+!listing stochastic_tools/src/trainers/GaussianProcessTrainer.C
+         start=// ---- Link function ----
+         end=// Apply link function to training data
+
+The transformed values are standardized and the GP trains entirely in latent space.
+At prediction time the inverse link is applied to recover physical outputs.
+
+### Derivative observations
+
+Monotonicity constraints are encoded via virtual derivative observations appended
+to the training set. The `GaussianProcess` object stores the virtual parameter
+locations and their associated derivative dimensions, and assembles an augmented
+covariance matrix at setup time:
+
+!listing stochastic_tools/src/trainers/GaussianProcessTrainer.C
+         start=// ---- Derivative constraint (virtual observations) ----
+         end=// ---- Penalty constraints ----
+
+### Penalty constraints
+
+Soft bounds on predicted means at specified evaluation points are enforced by
+adding a penalty term to the NLML loss and gradient during Adam optimization.
+The penalty is computed inside `getLoss` and `getGradient` in `GaussianProcess.C`.
+
@@ -77,6 +77,76 @@ class CovarianceFunctionBase : public MooseObject, public CovarianceInterface
                                const std::string & hyper_param_name,
                                unsigned int ind) const;
 
+  /**
+   * Compute the cross-covariance between function values and derivatives:
+   *   K_fd[i,j] = Cov[f(x_i), df(xd_j)/dx'_{dim}] = dK(x_i, xd_j)/dx'_{dim}
+   * For SE kernel: K_fd[i,j] = K(x_i,xd_j) * (x_{i,dim} - xd_{j,dim}) / ell_dim^2
+   * Default implementation calls mooseError; override in kernels that support
+   * derivative observations.
+   * @param K_fd  Output matrix of size (x.rows() x xd.rows())
+   * @param x     Function-value observation locations
+   * @param xd    Derivative observation locations
+   * @param dim   Input dimension for the derivative
+   */
+  virtual void computeCovarianceFD(RealEigenMatrix & K_fd,
+                                   const RealEigenMatrix & x,
+                                   const RealEigenMatrix & xd,
+                                   unsigned int dim) const;
+
+  /**
+   * Compute the cross-covariance between derivatives and function values:
+   *   K_df[i,j] = Cov[df(xd_i)/dx_{dim}, f(xp_j)] = dK(xd_i, xp_j)/dx_{d,dim}
+   * For SE kernel: K_df[i,j] = K(xd_i,xp_j) * (xp_{j,dim} - xd_{i,dim}) / ell_dim^2
+   * Note: K_df = K_fd^T when xd and xp are swapped.
+   * Default implementation calls mooseError.
+   * @param K_df  Output matrix of size (xd.rows() x xp.rows())
+   * @param xd    Derivative observation locations
+   * @param xp    Function-value locations (e.g. test points)
+   * @param dim   Input dimension for the derivative
+   */
+  virtual void computeCovarianceDf(RealEigenMatrix & K_df,
+                                   const RealEigenMatrix & xd,
+                                   const RealEigenMatrix & xp,
+                                   unsigned int dim) const;
+
+  /**
+   * Compute the covariance between two sets of derivative observations:
+   *   K_dd[i,j] = Cov[df(xd_i)/dx_{dim_i}, df(xdp_j)/dx'_{dim_j}]
+   *             = d^2 K(xd_i, xdp_j) / (dx_{dim_i} dx'_{dim_j})
+   * For SE kernel when dim_i == dim_j:
+   *   = K(xd_i,xdp_j) * [1/ell_k^2 - (xd_{i,k}-xdp_{j,k})^2/ell_k^4]
+   * For SE kernel when dim_i != dim_j:
+   *   = K(xd_i,xdp_j) * [-(xd_{i,ki}-xdp_{j,ki})*(xd_{i,kj}-xdp_{j,kj})/(ell_ki^2*ell_kj^2)]
+   * Default implementation calls mooseError.
+   * @param K_dd   Output matrix of size (xd.rows() x xdp.rows())
+   * @param xd     First set of derivative observation locations
+   * @param xdp    Second set of derivative observation locations
+   * @param dim_i  Derivative dimension for the first set
+   * @param dim_j  Derivative dimension for the second set
+   */
+  virtual void computeCovarianceDD(RealEigenMatrix & K_dd,
+                                   const RealEigenMatrix & xd,
+                                   const RealEigenMatrix & xdp,
+                                   unsigned int dim_i,
+                                   unsigned int dim_j) const;
+
+  /**
+   * Compute dK_cross/dhp: the derivative of the cross-covariance K(x, xc) with
+   * respect to a hyperparameter. Used in the penalty constraint gradient.
+   * For kernels that do not implement this, the default returns a zero matrix
+   * (penalty loss is computed but no gradient contribution is added).
+   * @param dKdhp      Output matrix of size (x.rows() x xc.rows())
+   * @param x          Training point locations (batch)
+   * @param xc         Constraint point location (1 x d)
+   * @param hp_name    Hyperparameter name (prefixed with covariance object name)
+   * @param ind        Index within vector hyperparameter (0 for scalars)
+   */
+  virtual void computedKdhyper_cross(RealEigenMatrix & dKdhp,
+                                     const RealEigenMatrix & x,
+                                     const RealEigenMatrix & xc,
+                                     const std::string & hp_name,
+                                     unsigned int ind) const;
+
   /// Check if a given parameter is tunable
   /// @param The name of the hyperparameter
   virtual bool isTunable(const std::string & name) const;

@@ -44,6 +44,32 @@ class SquaredExponentialCovariance : public CovarianceFunctionBase
                            const Real sigma_f_squared,
                            const int ind);
 
+  /// Cross-covariance K_fd[i,j] = Cov[f(x_i), df(xd_j)/dx'_{dim}] = dK(x_i,xd_j)/dx'_{dim}
+  void computeCovarianceFD(RealEigenMatrix & K_fd,
+                           const RealEigenMatrix & x,
+                           const RealEigenMatrix & xd,
+                           unsigned int dim) const override;
+
+  /// Cross-covariance K_df[i,j] = Cov[df(xd_i)/dx_{dim}, f(xp_j)] = dK(xd_i,xp_j)/dx_{d,dim}
+  void computeCovarianceDf(RealEigenMatrix & K_df,
+                           const RealEigenMatrix & xd,
+                           const RealEigenMatrix & xp,
+                           unsigned int dim) const override;
+
+  /// Second-derivative covariance K_dd[i,j] = d^2K/(dx_{dim_i} dx'_{dim_j})
+  void computeCovarianceDD(RealEigenMatrix & K_dd,
+                           const RealEigenMatrix & xd,
+                           const RealEigenMatrix & xdp,
+                           unsigned int dim_i,
+                           unsigned int dim_j) const override;
+
+  /// dK_cross(x, xc)/dhp: derivative of cross-covariance w.r.t. hyperparameters for penalty gradient
+  void computedKdhyper_cross(RealEigenMatrix & dKdhp,
+                             const RealEigenMatrix & x,
+                             const RealEigenMatrix & xc,
+                             const std::string & hp_name,
+                             unsigned int ind) const override;
+
 protected:
   /// lengh factor (\ell) for the kernel, in vector form for multiple parameters
   const std::vector<Real> & _length_factor;

@@ -19,6 +19,7 @@
 #include "CovarianceInterface.h"
 
 #include "GaussianProcess.h"
+#include "GPLinkFunction.h"
 
 class GaussianProcessTrainer : public SurrogateTrainer, public CovarianceInterface
 {