Error-in-Variables polynomial regression for gas chromatography calibration
crvftw_eiv is an R package implementing Error-in-Variables (EIV) polynomial
regression for the calibration of analytical instruments where both the
independent and dependent variables carry measurement uncertainty. It is
designed specifically for gas chromatographs used in natural gas analysis,
following the framework of ISO 6143 [1].
The package provides three functions of increasing generality:
| Function | Components | Uncertainty model | X ↔ Y correlation | Use case |
|---|---|---|---|---|
CRVFTW |
Single | Scalar |
No | Standard calibration, uncorrelated errors |
CRVFTW_COVXY |
Single | Full covariance matrices |
No | Correlated errors within one component |
CRVFTW_COV_MULTICOMP |
Multiple | Joint covariance |
Yes | Joint calibration with full correlation structure |
In gas chromatography, reference gas mixtures with certified concentrations
Ordinary Least Squares (OLS) regression assumes the independent variable
-
The certified concentrations
$x_i$ carry uncertainty from the gravimetric preparation of the reference mixtures. -
The detector responses
$y_i$ carry measurement uncertainty from the instrument itself. -
Modern detectors are precise enough that the uncertainty in
$x$ is comparable to that in$y$ .
An EIV regression correctly accounts for uncertainty in both variables by
simultaneously adjusting both
Assuming Gaussian measurement errors, each pair
Maximum-likelihood estimation of the calibration coefficients
This constrained minimization is solved iteratively using a Gauss-Newton scheme
with Lagrange multipliers. The three functions in this package differ in how
Each data point has independent scalar uncertainties
and the objective collapses to the classical Deming form:
This is appropriate when the calibration points are produced and measured independently, and all measurement errors are uncorrelated.
When calibration points share a common source of uncertainty — repeated
analyses of the same batch, inherited uncertainty from a common reference
standard, or any systematic effect — scalar uncertainties are insufficient. The
within-$x$ and within-$y$ covariances are then represented by full
The objective becomes:
subject to
The most general case arises in multi-component gas analysis, where all components are calibrated simultaneously and the covariance structure spans three physical dimensions:
- Points — repeated or shared sources of uncertainty between calibration levels,
- Components — concentrations of different components in a gravimetrically prepared mixture are correlated (more of A implies less of B); detector responses of different components are correlated through shared matrix effects and normalization,
-
X ↔ Y — the concentration of component
$k$ and the response of component$l$ can share uncertainty contributions through the physical preparation and measurement chain.
Let
The full joint covariance matrix has dimension
where
-
$\boldsymbol{\Sigma}_{XX}$ ($nK \times nK$ ) contains within- and between-component covariances of the concentrations, -
$\boldsymbol{\Sigma}_{YY}$ ($nK \times nK$ ) contains within- and between-component covariances of the detector responses, -
$\boldsymbol{\Sigma}_{XY}$ ($nK \times nK$ ) is the cross-covariance between concentrations and responses.
The lower-left block follows by symmetry:
The objective becomes:
subject to the per-component calibration constraints
Although each component has its own polynomial model
The update equations for the adjusted values, derived from the KKT conditions, gain additional cross-covariance terms:
where
All three functions solve the constrained optimization with the same scheme:
-
Initialize
$\hat{\boldsymbol{x}}$ ,$\hat{\boldsymbol{y}}$ and$\boldsymbol{b}$ via a weighted GLM fit. -
Linearize the constraint
$\hat{\boldsymbol{y}} - f(\hat{\boldsymbol{x}}, \boldsymbol{b}) = 0$ around the current estimate. -
Assemble the weight matrix
$\boldsymbol{W} = (\boldsymbol{J}_Z \boldsymbol{\Sigma}_Z \boldsymbol{J}_Z^{\top})^{-1}$ , the information matrix$\boldsymbol{\alpha} = \boldsymbol{A}^{\top} \boldsymbol{W} \boldsymbol{A}$ with$\boldsymbol{A} = \partial f / \partial \boldsymbol{b}$ , and the gradient$\boldsymbol{\beta} = \boldsymbol{\lambda}^{\top} \boldsymbol{A}$ . -
Solve for the coefficient update
$\Delta \boldsymbol{b} = \boldsymbol{\alpha}^{-1} \boldsymbol{\beta}^{\top}$ . -
Update
$\boldsymbol{b}$ ,$\hat{\boldsymbol{x}}$ , and$\hat{\boldsymbol{y}}$ . -
Test convergence on the relative norm of
$\Delta \boldsymbol{b}$ ; test divergence on the Lagrangian$L$ ; roll back to the previous iteration if$L$ increased. - Repeat until converged or
MAXITERreached.
The inverse of the information matrix at convergence,
This package requires R 4.0 or later. Download and install R from the R Project website.
Windows
- Go to the CRAN Windows page.
- Click Download R for Windows → base → Download R x.y.z for Windows.
- Run the installer and follow the prompts.
Linux
- Go to the CRAN Linux page and follow the instructions for your distribution (debian / fedora / redhat / suse / ubuntu).
- Verify the installation:
R --version
macOS
- Go to the CRAN macOS page.
- Download and install the
.pkgfile for your macOS version.
RStudio is a popular IDE for R. Download it from posit.co/downloads if you want a graphical interface.
# Install devtools if not already installed
install.packages("devtools")
# Install crvftw_eiv from GitHub
devtools::install_github("ErikvanderWerff/crvftw_eiv")source("R/CRVFTW.R")
# Linear calibration (NCEF = 2: intercept + slope)
X_INP <- c(0.0, 1.2, 2.5, 3.7, 5.0) # analytical response
Y_INP <- c(0.0, 1.0, 2.1, 3.0, 4.2) # certified concentration
UX <- c(0.05, 0.05, 0.05, 0.05, 0.05)
UY <- c(0.02, 0.02, 0.02, 0.02, 0.02)
NDATA <- length(X_INP)
NCEF <- 2
result <- CRVFTW(X_INP, Y_INP, UX, UY, NDATA, NCEF)
result$coefficients # b_1, b_2, ...
result$covariance # covariance of b
result$gof # sqrt(TSSD / df)
result$iter # iterations to convergenceFor a slope-only fit without intercept, set NCEF = 1.
The included test datasets can be loaded directly:
load("data/dataset_TABLE3_matrix.RData")source("R/CRVFTW_COVXY.R")
X_INP <- c(0.0, 1.2, 2.5, 3.7, 5.0)
Y_INP <- c(0.0, 1.0, 2.1, 3.0, 4.2)
NDATA <- length(X_INP)
NCEF <- 2
# Full covariance matrices (here shown diagonal; off-diagonals capture
# correlations between calibration points)
X_COV <- diag(c(0.05, 0.05, 0.05, 0.05, 0.05)^2)
Y_COV <- diag(c(0.02, 0.02, 0.02, 0.02, 0.02)^2)
# Example: introduce correlation between neighbouring points
# X_COV[1,2] <- X_COV[2,1] <- 0.3 * sqrt(X_COV[1,1] * X_COV[2,2])
result <- CRVFTW_COVXY(X_INP, Y_INP, X_COV, Y_COV, NDATA, NCEF)When X_COV and Y_COV are purely diagonal, CRVFTW_COVXY produces results
numerically identical to CRVFTW with the corresponding scalar uncertainties.
This forms the basis of the validation test.
source("R/CRVFTW_COV_MULTICOMP.R")
NDATA <- 5
NCOMP <- 3
NCEF <- c(2, 2, 3) # component 1 & 2 linear, component 3 quadratic
NTOTAL <- NDATA * NCOMP
# X_INP: NDATA x NCOMP — certified concentrations per component
X_INP <- matrix(c(
0.10, 0.20, 0.30, 0.40, 0.50, # component 1
0.05, 0.10, 0.15, 0.20, 0.25, # component 2
0.01, 0.02, 0.03, 0.04, 0.05), # component 3
nrow = NDATA, ncol = NCOMP)
# Y_INP: NDATA x NCOMP — detector responses per component
Y_INP <- matrix(...) # fill with measured responses
# X_COV: (NTOTAL x NTOTAL) block covariance
# diagonal blocks = within-component covariances of concentrations
# off-diagonal blocks = between-component covariances from gravimetric preparation
X_COV <- build_concentration_covariance(...)
# Y_COV: (NTOTAL x NTOTAL) block covariance
# detector-side covariances including shared matrix effects / normalization
Y_COV <- build_response_covariance(...)
# XY_COV (optional): (NTOTAL x NTOTAL) cross-covariance between X and Y
# Set to NULL (default) or a zero matrix if concentrations and responses
# are treated as independent.
XY_COV <- matrix(0, NTOTAL, NTOTAL)
result <- CRVFTW_COV_MULTICOMP(X_INP, Y_INP, X_COV, Y_COV,
NDATA, NCOMP, NCEF,
XY_COV = XY_COV)
# Per-component results
result$components[[1]]$coefficients # component 1 coefficients
result$components[[2]]$gof # component 2 goodness of fit
result$components[[3]]$significance # component 3 coefficient significance
# Joint results
result$joint.covariance # full (sum(NCEF) x sum(NCEF)) covariance
result$joint.gof # overall goodness of fit
result$iter # iterations to convergence
result$converged # TRUE / FALSE| Field | Description |
|---|---|
fitted.values.x / .y
|
Adjusted values |
absolute.residuals.x / .y
|
|
relative.residuals.x / .y
|
Residuals relative to the input values |
coefficients |
Polynomial coefficients |
covariance |
Asymptotic covariance matrix of coefficients |
standarderror |
Standard errors of coefficients |
tssd |
Total weighted sum of squared deviations |
df |
Degrees of freedom ( |
gof |
Goodness of fit |
gof_max |
Maximum individual weighted deviation |
t_value, p_value, significance
|
Student-$t$ statistics per coefficient |
no.cooks.outliers, cook.dist.datapoints
|
Cook's distance outlier flags |
no.leverage.outliers, leverage.datapoints
|
Leverage outlier flags |
iter |
Number of Gauss-Newton iterations |
The output has two levels:
Per-component (result$components[[k]] for CRVFTW above, computed for component
Note: the per-component
gofandtssdare local diagnostics that use only the within-component covariance block ($\boldsymbol{\Sigma}{XX}$ and $\boldsymbol{\Sigma}{YY}$ restricted to component$k$ ). They do not represent the true marginal fit quality, which would require the Schur complement of the full joint covariance. For an overall measure that accounts for the cross-component and cross-X/Y correlations, use the top-leveljoint.gof.
Joint (top level):
| Field | Description |
|---|---|
components |
List of length |
joint.covariance |
Full covariance matrix of all coefficients across components; off-diagonal blocks capture correlations between coefficients of different components |
joint.tssd |
Overall weighted sum of squared deviations using |
joint.df |
|
joint.gof |
Overall goodness of fit |
iter |
Iterations to convergence |
converged |
Convergence flag |
- ISO 6143:2001. Gas analysis — Comparison methods for determining and checking the composition of calibration gas mixtures. International Organization for Standardization.
- Deming, W.E. (1964). Statistical Adjustment of Data. Wiley.
- Bremser, W. & Hässelbarth, W. (1997). Controlling uncertainty in calibration: applications of a general Gaussian model to the problem of uncertainty propagation. Analytica Chimica Acta, 348, 61–69.
- Chinellato, O. & Achermann, E. (2004). Including covariances in calibration to obtain better measurement uncertainty estimates. SIAM Journal on Scientific Computing, 26(2), 523–536.
- ISO Guide to the Expression of Uncertainty in Measurement (GUM), JCGM 100:2008.
See the LICENSE file for details.
