Skip to content

A package to generate fast exact multivariate Gaussian fields with flexible covariance structures

License

Notifications You must be signed in to change notification settings

jtouyz/py_mvgrf

Repository files navigation

Multivariate Gaussian Field Generation using Circulant Embedding

This project provides tools for fast and exact sumulation of multivariate Gaussian Random Fields (GRFs) and performing spatial prediction using Simple Cokriging in N-dimensional space. It includes both an analytical solution for cokriging and a simulation-based approach (conditional simulation).

Table of Contents

  1. Goal
  2. Installation
  3. Quick Start
  4. Structure
  5. Key Concepts
  6. Mathematical Details
  7. Testing
  8. Key Parameters
  9. Comparing Analytical and Simulation Approaches

Goal

The primary goals of this codebase are:

  1. Simulation: Generate realizations of multivariate Gaussian Random Fields based on specified marginal covariance structures (Matérn, Gaussian, etc.) and cross-covariance relationships.
  2. Prediction (Cokriging): Estimate the values of the multivariate field at unobserved locations using known values at observation points. This is achieved through:
    • Analytical Simple Cokriging: Calculating the exact Best Linear Unbiased Estimator (BLUE) and associated kriging variance.
    • Conditional Simulation: Generating multiple realizations of the field that honor the observed data, allowing for uncertainty quantification beyond just the kriging variance.

Installation

Currently, the library is not packaged. To use it, clone the repository and add the src directory to your Python path:

import sys
sys.path.insert(0, 'path/to/multivariate_gaussian/src')

Or work within the project directory.

Quick Start

from src import (
    generate_multivariate_field_nd,
    generate_matern_covariance_functions,
    solve_cokriging_system,
    solve_cokriging_fft,
    visualize_fields,
)
import numpy as np

# 1. Define covariance structure (range scaled for grid size)
dims = (64, 64)
p = 2  # number of variables
cov_funcs = generate_matern_covariance_functions(
    p=p,
    nu_list=[1.5, 2.5],
    rho_list=[5.0, 6.0],       # Range scaled for 64x64 grid
    sigma_list=[1.0, 1.0],
    cross_corr_list=[[1.0, 0.7], [0.7, 1.0]]
)

# 2. Generate random field
Z_real, Z_imag = generate_multivariate_field_nd(dims, p, cov_funcs)

# 3. Visualize
visualize_fields(Z_real, Z_imag)

# 4. Sample ON-GRID observations (integer indices)
n_obs = 20
obs_x = np.random.randint(0, dims[0], size=n_obs)
obs_y = np.random.randint(0, dims[1], size=n_obs)
s_obs_indices = np.column_stack([obs_x, obs_y])
s_obs = s_obs_indices.astype(float)

# Extract observation values
Z_obs = np.array([[Z_real[j][obs_x[i], obs_y[i]] for j in range(p)] for i in range(n_obs)])

# 5a. Analytical kriging (works with any locations)
s_target = np.array([[25.0, 25.0], [30.0, 30.0]])
Z_pred, var_pred = solve_cokriging_system(s_target, s_obs, Z_obs, cov_funcs, p)

# 5b. FFT-accelerated kriging (requires on-grid observations)
s_target_indices = np.array([[25, 25], [30, 30]])  # Integer indices
Z_pred_fft, var_fft, info = solve_cokriging_fft(
    dims, s_obs_indices, Z_obs, s_target_indices, cov_funcs, p
)

Structure

src/
├── __init__.py              # Main package exports
├── core/                    # Field generation
│   ├── field_generation.py  # Main generation functions
│   └── embedding.py         # Circulant embedding helpers
├── covariance/              # Covariance models
│   ├── base.py             # Base classes
│   ├── model.py            # Covariance functions
│   └── cov_utils.py        # Builders and utilities
├── kriging/                 # Kriging methods
│   ├── analytical.py       # Analytical cokriging (Cholesky)
│   ├── simulation.py       # Simulation-based cokriging
│   └── fft_solver.py       # FFT-accelerated cokriging (CG)
├── visualization/           # Plotting
│   ├── plots.py            # Main plotting functions
│   └── utils.py            # Validation and formatting
└── utils.py                # General utilities

tests/
├── test_covariance.py      # Covariance model tests
├── test_grf.py            # Field generation tests
└── test_kriging.py        # Kriging method tests

Key Concepts

Multivariate Gaussian Random Fields

A multivariate GRF $\mathbf{Z}(\mathbf{s})=[Z_1(\mathbf{s}),Z_2(\mathbf{s}),\ldots,Z_p(\mathbf{s})]^T$ is a collection of $p$ random variables at each spatial location $\mathbf{s}$ in a domain $D\subset\mathbb{R}^d$, such that for any finite set of locations $\{\mathbf{s}_1,\ldots,\mathbf{s}_N\}$, the vector $[\mathbf{Z}(\mathbf{s}_1)^T,\ldots,\mathbf{Z}(\mathbf{s}_N)^T]^T$ follows a multivariate normal distribution. The field is characterized by its mean vector (often assumed to be zero for simplicity) and its covariance structure.

Covariance and Cross-Covariance

The spatial structure is defined by covariance functions. For a multivariate field, we need:

  • Marginal Covariance: $C_{ii}(\mathbf{h})=\text{Cov}[Z_i(\mathbf{s}),Z_i(\mathbf{s}+\mathbf{h})]$, describing the spatial correlation of variable $i$ with itself over a separation distance $\mathbf{h}$.
  • Cross-Covariance: $C_{ij}(\mathbf{h})=\text{Cov}[Z_i(\mathbf{s}),Z_j(\mathbf{s}+\mathbf{h})]$, describing the spatial correlation between variable $i$ and variable $j$ over a separation distance $\mathbf{h}$. Note that $C_{ij}(\mathbf{h})=C_{ji}(-\mathbf{h})$. For stationary fields, $C_{ij}(\mathbf{h})=C_{ji}(\mathbf{h})$ if the cross-covariance is symmetric.

Matérn Covariance

The Matérn family is a flexible and widely used covariance function:

$$C(h;\nu,\rho,\sigma^2)=\sigma^2\frac{2^{1-\nu}}{\Gamma(\nu)}\left(\frac{\sqrt{2\nu}h}{\rho}\right)^\nu K_\nu\left(\frac{\sqrt{2\nu}h}{\rho}\right)$$

where:

  • $h=|\mathbf{h}|$ is the Euclidean distance.
  • $\sigma^2$ is the marginal variance (sill).
  • $\rho$ is the spatial range parameter (effective range is often $\approx 3\rho$).
  • $\nu$ is the smoothness parameter ($\nu\to\infty$ gives Gaussian, $\nu=0.5$ gives Exponential).
  • $\Gamma(\cdot)$ is the gamma function.
  • $K_\nu(\cdot)$ is the modified Bessel function of the second kind of order $\nu$.

This code uses the Matérn function for marginal covariances and allows scaling factors for cross-covariances based on a linear model of coregionalization (though simplified here).

⚠️ Note on Covariance Validity

This library does not explicitly enforce all theoretical validity conditions for the multivariate covariance structures (e.g., ensuring the cross-covariance parameters result in a positive semi-definite matrix for all possible spatial lags).

Some parameter combinations are valid, while others are not. For the Parsimonious Multivariate Matérn Model, specific conditions must be met (e.g., constraints on $\rho_{ij}$ relative to smoothness parameters $\nu$).

  • Diagnostics: Please use the included diagnostics notebook notebooks/05_model_validation.ipynb to numerically check if your generated covariance matrices are positive semi-definite (PSD).
  • Reference: For detailed validity conditions, please refer to:

    Gneiting, T., Kleiber, W., & Schlather, M. (2010). Matérn Cross-Covariance Functions for Multivariate Random Fields. Journal of the American Statistical Association, 105(491), 1167–1177. PDF | DOI

Simple Cokriging (BLUE)

Simple Cokriging provides the Best Linear Unbiased Estimator (BLUE) for $\mathbf{Z}(\mathbf{s}_0)$ at a target location $\mathbf{s}0$, assuming a known mean (here, zero) and a known covariance structure. It finds the linear combination of observed data $\mathbf{Z}{obs}$ that minimizes the estimation variance.

Conditional Simulation

Conditional simulation generates realizations (maps) of the random field $\mathbf{Z}(\mathbf{s})$ that are consistent with both the assumed covariance structure and the observed data values $\mathbf{Z}_{obs}$. Unlike kriging which gives a single best estimate and variance, conditional simulation provides multiple possible scenarios, allowing for a richer assessment of spatial uncertainty (e.g., calculating probabilities of exceeding thresholds).

Mathematical Details

For complete mathematical formulations, derivations, and algorithm steps, please refer to the generated PDF documentation:

Overview of Methods

Simple Cokriging (BLUE)

Simple Cokriging provides the Best Linear Unbiased Estimator (BLUE) for $\mathbf{Z}(\mathbf{s}_0)$ at a target location $\mathbf{s}_0$. It solves a linear system involving the covariance between observations and the target to minimize estimation variance.

Conditional Simulation

Conditional simulation generates realizations of the random field that honor both the covariance structure and the observed data. It uses the "kriging of residuals" approach:

  1. Generate an unconditional simulation.
  2. Calculate the kriging error (residual) at observation locations.
  3. Add the interpolated residuals to the unconditional simulation.

This ensures the final field passes through the data points (honoring observations) while maintaining the correct spatial correlation structure (honoring covariance).

Testing

Run tests to verify the library works correctly:

# Run individual test modules
python tests/test_covariance.py
python tests/test_grf.py
python tests/test_kriging.py

# Or use pytest (if installed)
pytest tests/

Tests cover:

  • Covariance function correctness
  • Field generation shapes and basic statistical properties
  • Kriging interpolation and variance behavior

Dependencies

You need Python 3.13+ with the following core libraries:

  • numpy - Array operations
  • scipy - Special functions and spatial distance calculations
  • matplotlib - Plotting (basic visualizations)
  • plotly - Interactive 3D visualizations (optional)
  • joblib - Parallel processing for simulation-based kriging

Install dependencies:

pip install numpy scipy matplotlib plotly joblib

Running Examples

See the notebooks/ directory for example Jupyter notebooks:

  • 01_quickstart_field_generation.ipynb - Quick start with field generation
  • 02_covariance_models.ipynb - Understand covariance models
  • 03_kriging_spatial.ipynb - Spatial kriging (analytical + FFT methods)
  • 03_kriging_spatial_simulation.ipynb - Simulation-based cokriging
  • 04_kriging_time_series.ipynb - Time series kriging
  • 05_model_validation.ipynb - Validate generated fields

Key Parameters

  • dims: Tuple defining the grid dimensions (e.g., (50, 50) for 2D).
  • p: Number of variables in the multivariate field.
  • cov_funcs: Generated by generate_matern_covariance_functions, controlled by:
    • nu_list: Smoothness parameters for each variable.
    • rho_list: Range parameters for each variable.
    • sigma_list: Marginal standard deviations for each variable.
    • cross_corr_list: Defines the cross-correlation structure.
  • s_obs: NumPy array of observation coordinates (n_obs, d).
  • Z_obs: NumPy array of observed values (n_obs, p).
  • target_indices: Indices specifying the target locations for prediction (e.g., np.indices(dims) for the whole grid).
  • N_sim: Number of conditional simulations to run. Higher values give more stable estimates but take longer.

Comparing Analytical and Simulation Approaches

The sample_kriging.ipynb notebook includes cells specifically designed to compare the results from solve_cokriging_system (Analytical BLUE) and kriging_via_simulation (Simulation Mean/Variance):

  • Difference Plots: Visualizes the spatial difference between the analytical prediction and the simulation mean (and similarly for variance).
  • Summary Statistics: Calculates MAE, RMSE, and bias between the two approaches.
  • Scatter Plots: Plots analytical results vs. simulation results for each target point. Points should ideally fall on the y=x line.

These comparisons help verify the conditional simulation implementation and illustrate how the simulation mean/variance converge towards the analytical BLUE results as N_sim increases.

Citation & Background

This library implements and extends methodologies originally developed in the author's PhD dissertation, specifically focusing on multivariate circulant embedding (Chapter 4). It provides production-ready implementations that expand the original academic scope with:

  • Multiple optimized kriging methods (Analytical, FFT-based, Simulation-based)
  • Comprehensive uncertainty quantification
  • Performance optimizations for large-scale fields
  • Integrated plotting and visualization tools

If you use this software, please cite it using the metadata in CITATION.cff or as:

Touyz, J. (2025). Multivariate Gaussian Field Generation using Circulant Embedding (Version 1.0.0) [Computer software]. https://github.com/joshuahtouyz/py_mvgrf

You may also cite the original dissertation:

Touyz, J. (2016). Novel Methodologies in Multivariate Spatial Statistics (Doctoral dissertation). The George Washington University. Available at ScholarSpace.

About

A package to generate fast exact multivariate Gaussian fields with flexible covariance structures

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published