crvftw_eiv

Error-in-Variables polynomial regression for gas chromatography calibration

Overview

crvftw_eiv is an R package implementing Error-in-Variables (EIV) polynomial regression for the calibration of analytical instruments where both the independent and dependent variables carry measurement uncertainty. It is designed specifically for gas chromatographs used in natural gas analysis, following the framework of ISO 6143 [1].

The package provides three functions of increasing generality:

Function	Components	Uncertainty model	X ↔ Y correlation	Use case
`CRVFTW`	Single	Scalar $u(x_i), u(y_i)$	No	Standard calibration, uncorrelated errors
`CRVFTW_COVXY`	Single	Full covariance matrices $\boldsymbol{\Sigma}_X, \boldsymbol{\Sigma}_Y$	No	Correlated errors within one component
`CRVFTW_COV_MULTICOMP`	Multiple	Joint covariance $\boldsymbol{\Sigma}_Z$	Yes	Joint calibration with full correlation structure

Motivation

In gas chromatography, reference gas mixtures with certified concentrations $x_i$ are injected into the instrument, which produces analytical responses $y_i$. The goal of calibration is to determine the function $y = f(x, \boldsymbol{b})$ that relates concentration to response — typically a polynomial:

$$y = b_1 + b_2 x + b_3 x^2 + \cdots + b_m x^{m-1}$$

Ordinary Least Squares (OLS) regression assumes the independent variable $x$ is known without error. This assumption is increasingly violated in modern gas analysis:

The certified concentrations $x_i$ carry uncertainty from the gravimetric preparation of the reference mixtures.
The detector responses $y_i$ carry measurement uncertainty from the instrument itself.
Modern detectors are precise enough that the uncertainty in $x$ is comparable to that in $y$.

An EIV regression correctly accounts for uncertainty in both variables by simultaneously adjusting both $\hat{x}_i$ and $\hat{y}_i$ such that $\hat{y}_i = f(\hat{x}_i, \boldsymbol{b})$, while minimizing a weighted measure of the residuals. In the diagonal case this reduces to the classical Deming regression [2]; the package generalizes it to full covariance matrices and to multi-component problems.

Theoretical background

The joint probability density function

Assuming Gaussian measurement errors, each pair $(x_i, y_i)$ is drawn from a bivariate normal distribution centered on the unknown true values $(\hat{x}_i, \hat{y}_i)$. The joint probability density of the residual vector $\boldsymbol{z}_i = (x_i - \hat{x}_i,\ y_i - \hat{y}_i)^{\top}$ is:

$$p(\boldsymbol{z}_i) = \frac{1}{2\pi\sqrt{\det \boldsymbol{\Sigma}_i}}\, e^{-\frac{1}{2}\, \boldsymbol{z}_i^{\top} \boldsymbol{\Sigma}_i^{-1} \boldsymbol{z}_i}$$

Maximum-likelihood estimation of the calibration coefficients $\boldsymbol{b}$ is equivalent to minimizing the total negative log-likelihood subject to the calibration constraint:

$$\min_{\hat{\boldsymbol{x}},\, \boldsymbol{b}}\ \sum_{i=1}^{n} \boldsymbol{z}_i^{\top} \boldsymbol{\Sigma}_i^{-1} \boldsymbol{z}_i \quad\text{subject to}\quad \hat{y}_i = f(\hat{x}_i, \boldsymbol{b})$$

This constrained minimization is solved iteratively using a Gauss-Newton scheme with Lagrange multipliers. The three functions in this package differ in how $\boldsymbol{\Sigma}_i$ — or, more generally, the joint covariance of all observations — is modelled.

1. `CRVFTW` — scalar uncertainties

Each data point has independent scalar uncertainties $u(x_i)$ and $u(y_i)$, and different points are uncorrelated. The per-point covariance reduces to:

$$\boldsymbol{\Sigma}_i = \begin{pmatrix} u^2(x_i) & 0 \\ 0 & u^2(y_i) \end{pmatrix}$$

and the objective collapses to the classical Deming form:

$$\min_{\hat{\boldsymbol{x}},\, \boldsymbol{b}}\ \sum_{i=1}^{n}\left[ \frac{(x_i - \hat{x}_i)^{2}}{u^{2}(x_i)} + \frac{\bigl(y_i - f(\hat{x}_i, \boldsymbol{b})\bigr)^{2}}{u^{2}(y_i)} \right]$$

This is appropriate when the calibration points are produced and measured independently, and all measurement errors are uncorrelated.

2. `CRVFTW_COVXY` — full covariance per variable

When calibration points share a common source of uncertainty — repeated analyses of the same batch, inherited uncertainty from a common reference standard, or any systematic effect — scalar uncertainties are insufficient. The within-$x$ and within-$y$ covariances are then represented by full $n \times n$ matrices:

$$\boldsymbol{\Sigma}_X = \mathrm{Cov}(\boldsymbol{x}), \qquad \boldsymbol{\Sigma}_Y = \mathrm{Cov}(\boldsymbol{y})$$

The objective becomes:

$$\min_{\hat{\boldsymbol{x}},\, \boldsymbol{b}}\\ (\boldsymbol{x} - \hat{\boldsymbol{x}})^{\top} \boldsymbol{\Sigma}_X^{-1} (\boldsymbol{x} - \hat{\boldsymbol{x}}) + (\boldsymbol{y} - \hat{\boldsymbol{y}})^{\top} \boldsymbol{\Sigma}_Y^{-1} (\boldsymbol{y} - \hat{\boldsymbol{y}})$$

subject to $\hat{\boldsymbol{y}} = f(\hat{\boldsymbol{x}}, \boldsymbol{b})$. This still assumes no correlation between $\boldsymbol{x}$ and $\boldsymbol{y}$ — the two sets of measurements are modelled as independent random vectors.

3. `CRVFTW_COV_MULTICOMP` — multi-component with joint covariance

The most general case arises in multi-component gas analysis, where all components are calibrated simultaneously and the covariance structure spans three physical dimensions:

Points — repeated or shared sources of uncertainty between calibration levels,
Components — concentrations of different components in a gravimetrically prepared mixture are correlated (more of A implies less of B); detector responses of different components are correlated through shared matrix effects and normalization,
X ↔ Y — the concentration of component $k$ and the response of component $l$ can share uncertainty contributions through the physical preparation and measurement chain.

Let $\boldsymbol{X}$ and $\boldsymbol{Y}$ be $n \times K$ matrices (rows = calibration points, columns = components), and define the stacked vectors

$$\boldsymbol{x}_{\text{all}} = \mathrm{vec}(\boldsymbol{X}),\qquad \boldsymbol{y}_{\text{all}} = \mathrm{vec}(\boldsymbol{Y}),\qquad \boldsymbol{z} = \begin{pmatrix} \boldsymbol{x}_{\text{all}} \\ \boldsymbol{y}_{\text{all}} \end{pmatrix} \in \mathbb{R}^{2nK}$$

The full joint covariance matrix has dimension $2nK \times 2nK$ with a block structure:

$$\boldsymbol{\Sigma}_Z = \begin{pmatrix} \boldsymbol{\Sigma}_{XX} & \boldsymbol{\Sigma}_{XY} \\\ \boldsymbol{\Sigma}_{YX} & \boldsymbol{\Sigma}_{YY} \end{pmatrix}$$

where

$\boldsymbol{\Sigma}_{XX}$ ($nK \times nK$) contains within- and between-component covariances of the concentrations,
$\boldsymbol{\Sigma}_{YY}$ ($nK \times nK$) contains within- and between-component covariances of the detector responses,
$\boldsymbol{\Sigma}_{XY}$ ($nK \times nK$) is the cross-covariance between concentrations and responses.

The lower-left block follows by symmetry:

$$\boldsymbol{\Sigma}_{YX} = \boldsymbol{\Sigma}_{XY}^{\top}$$

The objective becomes:

$$\min_{\hat{\boldsymbol{z}},\, \boldsymbol{b}_1, \ldots, \boldsymbol{b}_K}\\ (\boldsymbol{z} - \hat{\boldsymbol{z}})^{\top}\, \boldsymbol{\Sigma}_Z^{-1}\, (\boldsymbol{z} - \hat{\boldsymbol{z}})$$

subject to the per-component calibration constraints

$$\hat{y}_{i,k} = f_k(\hat{x}_{i,k}, \boldsymbol{b}_k), \qquad i = 1, \ldots, n, \quad k = 1, \ldots, K$$

Although each component has its own polynomial model $f_k$ and its own coefficient vector $\boldsymbol{b}_k$, the estimation is performed jointly: the off-diagonal blocks of $\boldsymbol{\Sigma}_Z$ couple all components during the optimization. Only when $\boldsymbol{\Sigma}_Z$ is block-diagonal do the problems decouple into independent per-component regressions.

The update equations for the adjusted values, derived from the KKT conditions, gain additional cross-covariance terms:

$$\hat{\boldsymbol{x}} = \boldsymbol{x} + \boldsymbol{\Sigma}_{XX}(\boldsymbol{D}_x \boldsymbol{\lambda}) + \boldsymbol{\Sigma}_{XY}(\boldsymbol{D}_y \boldsymbol{\lambda})$$

$$\hat{\boldsymbol{y}} = \boldsymbol{y} + \boldsymbol{\Sigma}_{YX}(\boldsymbol{D}_x \boldsymbol{\lambda}) + \boldsymbol{\Sigma}_{YY}(\boldsymbol{D}_y \boldsymbol{\lambda})$$

where $\boldsymbol{D}_x = \partial f / \partial \boldsymbol{x}$, $\boldsymbol{D}_y = \partial f / \partial \boldsymbol{y} = \boldsymbol{1}$, and $\boldsymbol{\lambda}$ are the Lagrange multipliers.

The Gauss-Newton iteration

All three functions solve the constrained optimization with the same scheme:

Initialize $\hat{\boldsymbol{x}}$, $\hat{\boldsymbol{y}}$ and $\boldsymbol{b}$ via a weighted GLM fit.
Linearize the constraint $\hat{\boldsymbol{y}} - f(\hat{\boldsymbol{x}}, \boldsymbol{b}) = 0$ around the current estimate.
Assemble the weight matrix $\boldsymbol{W} = (\boldsymbol{J}_Z \boldsymbol{\Sigma}_Z \boldsymbol{J}_Z^{\top})^{-1}$, the information matrix $\boldsymbol{\alpha} = \boldsymbol{A}^{\top} \boldsymbol{W} \boldsymbol{A}$ with $\boldsymbol{A} = \partial f / \partial \boldsymbol{b}$, and the gradient $\boldsymbol{\beta} = \boldsymbol{\lambda}^{\top} \boldsymbol{A}$.
Solve for the coefficient update $\Delta \boldsymbol{b} = \boldsymbol{\alpha}^{-1} \boldsymbol{\beta}^{\top}$.
Update $\boldsymbol{b}$, $\hat{\boldsymbol{x}}$, and $\hat{\boldsymbol{y}}$.
Test convergence on the relative norm of $\Delta \boldsymbol{b}$; test divergence on the Lagrangian $L$; roll back to the previous iteration if $L$ increased.
Repeat until converged or MAXITER reached.

The inverse of the information matrix at convergence, $\boldsymbol{\alpha}^{-1}$, is the asymptotic covariance matrix of the estimated coefficients.

Installation

Install R

This package requires R 4.0 or later. Download and install R from the R Project website.

Windows

Go to the CRAN Windows page.
Click Download R for Windows → base → Download R x.y.z for Windows.
Run the installer and follow the prompts.

Linux

Go to the CRAN Linux page and follow the instructions for your distribution (debian / fedora / redhat / suse / ubuntu).
Verify the installation:
```
R --version
```

macOS

Go to the CRAN macOS page.
Download and install the .pkg file for your macOS version.

Install RStudio (optional)

RStudio is a popular IDE for R. Download it from posit.co/downloads if you want a graphical interface.

Install the package

# Install devtools if not already installed
install.packages("devtools")

# Install crvftw_eiv from GitHub
devtools::install_github("ErikvanderWerff/crvftw_eiv")

Usage

CRVFTW — single component, scalar uncertainties

source("R/CRVFTW.R")

# Linear calibration (NCEF = 2: intercept + slope)
X_INP <- c(0.0, 1.2, 2.5, 3.7, 5.0)   # analytical response
Y_INP <- c(0.0, 1.0, 2.1, 3.0, 4.2)   # certified concentration
UX    <- c(0.05, 0.05, 0.05, 0.05, 0.05)
UY    <- c(0.02, 0.02, 0.02, 0.02, 0.02)
NDATA <- length(X_INP)
NCEF  <- 2

result <- CRVFTW(X_INP, Y_INP, UX, UY, NDATA, NCEF)

result$coefficients   # b_1, b_2, ...
result$covariance     # covariance of b
result$gof            # sqrt(TSSD / df)
result$iter           # iterations to convergence

For a slope-only fit without intercept, set NCEF = 1.

The included test datasets can be loaded directly:

load("data/dataset_TABLE3_matrix.RData")

CRVFTW_COVXY — single component, full covariance matrices

source("R/CRVFTW_COVXY.R")

X_INP <- c(0.0, 1.2, 2.5, 3.7, 5.0)
Y_INP <- c(0.0, 1.0, 2.1, 3.0, 4.2)
NDATA <- length(X_INP)
NCEF  <- 2

# Full covariance matrices (here shown diagonal; off-diagonals capture
# correlations between calibration points)
X_COV <- diag(c(0.05, 0.05, 0.05, 0.05, 0.05)^2)
Y_COV <- diag(c(0.02, 0.02, 0.02, 0.02, 0.02)^2)

# Example: introduce correlation between neighbouring points
# X_COV[1,2] <- X_COV[2,1] <- 0.3 * sqrt(X_COV[1,1] * X_COV[2,2])

result <- CRVFTW_COVXY(X_INP, Y_INP, X_COV, Y_COV, NDATA, NCEF)

When X_COV and Y_COV are purely diagonal, CRVFTW_COVXY produces results numerically identical to CRVFTW with the corresponding scalar uncertainties. This forms the basis of the validation test.

CRVFTW_COV_MULTICOMP — multi-component, joint covariance

source("R/CRVFTW_COV_MULTICOMP.R")

NDATA  <- 5
NCOMP  <- 3
NCEF   <- c(2, 2, 3)   # component 1 & 2 linear, component 3 quadratic
NTOTAL <- NDATA * NCOMP

# X_INP: NDATA x NCOMP — certified concentrations per component
X_INP <- matrix(c(
    0.10, 0.20, 0.30, 0.40, 0.50,   # component 1
    0.05, 0.10, 0.15, 0.20, 0.25,   # component 2
    0.01, 0.02, 0.03, 0.04, 0.05),  # component 3
  nrow = NDATA, ncol = NCOMP)

# Y_INP: NDATA x NCOMP — detector responses per component
Y_INP <- matrix(...)  # fill with measured responses

# X_COV: (NTOTAL x NTOTAL) block covariance
#   diagonal blocks  = within-component covariances of concentrations
#   off-diagonal blocks = between-component covariances from gravimetric preparation
X_COV <- build_concentration_covariance(...)

# Y_COV: (NTOTAL x NTOTAL) block covariance
#   detector-side covariances including shared matrix effects / normalization
Y_COV <- build_response_covariance(...)

# XY_COV (optional): (NTOTAL x NTOTAL) cross-covariance between X and Y
#   Set to NULL (default) or a zero matrix if concentrations and responses
#   are treated as independent.
XY_COV <- matrix(0, NTOTAL, NTOTAL)

result <- CRVFTW_COV_MULTICOMP(X_INP, Y_INP, X_COV, Y_COV,
                               NDATA, NCOMP, NCEF,
                               XY_COV = XY_COV)

# Per-component results
result$components[[1]]$coefficients   # component 1 coefficients
result$components[[2]]$gof            # component 2 goodness of fit
result$components[[3]]$significance   # component 3 coefficient significance

# Joint results
result$joint.covariance               # full (sum(NCEF) x sum(NCEF)) covariance
result$joint.gof                      # overall goodness of fit
result$iter                           # iterations to convergence
result$converged                      # TRUE / FALSE

Output fields

`CRVFTW` and `CRVFTW_COVXY`

Field	Description
`fitted.values.x` / `.y`	Adjusted values $\hat{x}_i$, $\hat{y}_i$
`absolute.residuals.x` / `.y`	$x_i - \hat{x}_i$, $y_i - \hat{y}_i$
`relative.residuals.x` / `.y`	Residuals relative to the input values
`coefficients`	Polynomial coefficients $b_1, b_2, \ldots$
`covariance`	Asymptotic covariance matrix of coefficients
`standarderror`	Standard errors of coefficients
`tssd`	Total weighted sum of squared deviations
`df`	Degrees of freedom ($n - \text{NCEF}$)
`gof`	Goodness of fit $\sqrt{\text{TSSD}/\text{df}}$
`gof_max`	Maximum individual weighted deviation
`t_value`, `p_value`, `significance`	Student-$t$ statistics per coefficient
`no.cooks.outliers`, `cook.dist.datapoints`	Cook's distance outlier flags
`no.leverage.outliers`, `leverage.datapoints`	Leverage outlier flags
`iter`	Number of Gauss-Newton iterations

`CRVFTW_COV_MULTICOMP`

The output has two levels:

Per-component (result$components[[k]] for $k = 1, \ldots, K$): a list containing the same fields as CRVFTW above, computed for component $k$ alone — fitted values, residuals, coefficients, covariance, GOF statistics, t/p values, and outlier flags.

Note: the per-component gof and tssd are local diagnostics that use only the within-component covariance block ($\boldsymbol{\Sigma}{XX}$ and $\boldsymbol{\Sigma}{YY}$ restricted to component $k$). They do not represent the true marginal fit quality, which would require the Schur complement of the full joint covariance. For an overall measure that accounts for the cross-component and cross-X/Y correlations, use the top-level joint.gof.

Joint (top level):

Field	Description
`components`	List of length $K$ with per-component results
`joint.covariance`	Full covariance matrix of all coefficients across components; off-diagonal blocks capture correlations between coefficients of different components
`joint.tssd`	Overall weighted sum of squared deviations using $\boldsymbol{\Sigma}_Z^{-1}$
`joint.df`	$nK - \sum_k \text{NCEF}_k$
`joint.gof`	Overall goodness of fit
`iter`	Iterations to convergence
`converged`	Convergence flag

References

ISO 6143:2001. Gas analysis — Comparison methods for determining and checking the composition of calibration gas mixtures. International Organization for Standardization.
Deming, W.E. (1964). Statistical Adjustment of Data. Wiley.
Bremser, W. & Hässelbarth, W. (1997). Controlling uncertainty in calibration: applications of a general Gaussian model to the problem of uncertainty propagation. Analytica Chimica Acta, 348, 61–69.
Chinellato, O. & Achermann, E. (2004). Including covariances in calibration to obtain better measurement uncertainty estimates. SIAM Journal on Scientific Computing, 26(2), 523–536.
ISO Guide to the Expression of Uncertainty in Measurement (GUM), JCGM 100:2008.

License

See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
R		R
data		data
docs		docs
python		python
tests		tests
.RData		.RData
.Rhistory		.Rhistory
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
crvftw_eiv.Rproj		crvftw_eiv.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

crvftw_eiv

Overview

Motivation

Theoretical background

The joint probability density function

1. `CRVFTW` — scalar uncertainties

2. `CRVFTW_COVXY` — full covariance per variable

3. `CRVFTW_COV_MULTICOMP` — multi-component with joint covariance

The Gauss-Newton iteration

Installation

Install R

Install RStudio (optional)

Install the package

Usage

CRVFTW — single component, scalar uncertainties

CRVFTW_COVXY — single component, full covariance matrices

CRVFTW_COV_MULTICOMP — multi-component, joint covariance

Output fields

`CRVFTW` and `CRVFTW_COVXY`

`CRVFTW_COV_MULTICOMP`

References

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

crvftw_eiv

Overview

Motivation

Theoretical background

The joint probability density function

1. CRVFTW — scalar uncertainties

2. CRVFTW_COVXY — full covariance per variable

3. CRVFTW_COV_MULTICOMP — multi-component with joint covariance

The Gauss-Newton iteration

Installation

Install R

Install RStudio (optional)

Install the package

Usage

CRVFTW — single component, scalar uncertainties

CRVFTW_COVXY — single component, full covariance matrices

CRVFTW_COV_MULTICOMP — multi-component, joint covariance

Output fields

CRVFTW and CRVFTW_COVXY

CRVFTW_COV_MULTICOMP

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. `CRVFTW` — scalar uncertainties

2. `CRVFTW_COVXY` — full covariance per variable

3. `CRVFTW_COV_MULTICOMP` — multi-component with joint covariance

`CRVFTW` and `CRVFTW_COVXY`

`CRVFTW_COV_MULTICOMP`

Packages