Skip to content

gragusa/Regress.jl

Repository files navigation

Regress.jl

CI codecov.io Aqua QA SciML Code Style lifecycle

High-performance linear models with fixed effects and instrumental variables.

Regress.jl is inspired by FixedEffectModels.jl. While sharing similar goals, Regress.jl takes a different architectural approach, with tight integration with CovarianceMatrices.jl and an extended family of IV estimators.

Key Features

  • OLS and IV estimation with high-dimensional fixed effects
  • Tight CovarianceMatrices.jl integration with model + vcov() syntax
  • Extended IV estimators: TSLS, LIML, Fuller, and KClass
  • Comprehensive first-stage diagnostics for IV models
  • Precomputed inference statistics for fast post-estimation

Installation

using Pkg
Pkg.add(url="https://github.com/gragusa/Regress.jl")

Quick Start

using Regress, DataFrames

# OLS estimation
model = ols(df, @formula(y ~ x1 + x2))

# OLS with fixed effects
model = ols(df, @formula(y ~ x1 + fe(industry) + fe(year)))

# IV estimation (Two-Stage Least Squares)
model = iv(TSLS(), df, @formula(y ~ x + (endo ~ z1 + z2)))

CovarianceMatrices.jl Integration

Regress.jl is designed around tight integration with CovarianceMatrices.jl, providing a seamless workflow for robust inference.

The model + vcov() Syntax

A key feature is the + operator for updating a model's variance-covariance estimator. This returns a new model with all inference statistics precomputed:

model = ols(df, @formula(y ~ x1 + x2))

# Create a new model with HC3 standard errors
model_hc3 = model + vcov(HC3())

# All statistics are immediately available (precomputed)
stderror(model_hc3)      # HC3 standard errors
coeftable(model_hc3)     # Coefficient table with HC3 inference
model_hc3.F              # Robust Wald F-statistic
model_hc3.p              # p-value of F-statistic

The returned model has:

  • The same underlying data and coefficients
  • Precomputed vcov matrix, standard errors, t-statistics, and p-values
  • Robust Wald F-statistic for joint significance

All the estimators defined in CovarianceMatrices.jl are supported.

IV Estimators

Regress.jl provides a family of IV estimators unified under the K-class framework:

# Two-Stage Least Squares (most common)
model_tsls = iv(TSLS(), df, @formula(y ~ x + (endo ~ z1 + z2)))

# LIML - better finite-sample properties, especially with weak instruments
model_liml = iv(LIML(), df, @formula(y ~ x + (endo ~ z1 + z2)))

# Fuller - bias-corrected estimator
# Fuller(1.0) is approximately median-unbiased
# Fuller(4.0) minimizes mean squared error
model_fuller = iv(Fuller(1.0), df, @formula(y ~ x + (endo ~ z1 + z2)))

# Generic K-class with custom kappa
model_kclass = iv(KClass(0.9), df, @formula(y ~ x + (endo ~ z1 + z2)))

The + vcov() syntax also works with IV models and automatically recomputes first-stage diagnostics:

model = iv(TSLS(), df, @formula(y ~ x + (endo ~ z1 + z2)))

# Update to HC3 - recomputes ALL statistics, including first-stage F
model_hc3 = model + vcov(HC3())
model_hc3.F_kp           # Joint first-stage F with HC3
model_hc3.F_kp_per_endo  # Per-endogenous F-stats with HC3

First-Stage Diagnostics

For IV estimation, Regress.jl provides comprehensive first-stage diagnostics:

model = iv(TSLS(), df, @formula(y ~ x + (endo ~ z1 + z2)))

The output automatically displays:

  • Joint Kleibergen-Paap F-statistic: Tests all first-stage coefficients jointly
  • Per-endogenous F-statistics: Individual first-stage F-stats for each endogenous variable
                                    TSLS
────────────────────────────────────────────────────────────────────────────
Number of obs:                   1000   Converged:                      true
dof (model):                        2   dof (residuals):                 997
R²:                             0.892   R² adjusted:                   0.892
F-statistic:                  156.234   P-value:                       0.000
F (1st stage, joint):         124.673   P (1st stage, joint):          0.000
────────────────────────────────────────────────────────────────────────────
               Estimate  Std. Error   t-stat   Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────────
x               1.98234     0.05123   38.695     0.0000    1.88176    2.08292
endo            3.01456     0.08234   36.612     0.0000    2.85301    3.17611
(Intercept)     0.98765     0.04321   22.856     0.0000    0.90293    1.07237
────────────────────────────────────────────────────────────────────────────

First-Stage F-Statistics (per endogenous variable):
────────────────────────────────────────────────────────────────────────────
Endogenous                             F-stat        P-value
────────────────────────────────────────────────────────────────────────────
endo                                 124.6735         0.0000
────────────────────────────────────────────────────────────────────────────
Note: Std. errors computed using HC1 variance estimator; 2 excluded instruments, 1 endogenous

first_stage() - Extracting First-Stage Diagnostics

The first_stage() function returns a FirstStageResult struct for programmatic access:

model = iv(TSLS(), df, @formula(y ~ x + (endo ~ z1 + z2)))

fs = first_stage(model)
fs.F_joint           # Joint Kleibergen-Paap F-statistic
fs.p_joint           # p-value of joint test
fs.F_per_endo        # Per-endogenous F-statistics
fs.p_per_endo        # Per-endogenous p-values

# With a different variance estimator
model_hc3 = model + vcov(HC3())
fs_hc3 = first_stage(model_hc3)

Large-Scale IV Estimation

Regress.jl efficiently handles IV estimation with many instruments. This example uses the Angrist-Krueger (1991) returns-to-schooling data with quarter-of-birth instruments.

Example: Returns to Schooling with Many Instruments

using Regress, CSV, DataFrames, CategoricalArrays

# Load Angrist-Krueger data (~330k observations)
data = CSV.read("path/to/JIVE.txt", DataFrame)
data.sob = categorical(data.sob)  # State of birth
data.yob = categorical(data.yob)  # Year of birth
data.qob = categorical(data.qob)  # Quarter of birth

# Large model: 180 excluded instruments
# Education is endogenous, instrumented by yob×qob and sob×qob interactions
model = iv(TSLS(), data,
  @formula(lwage ~ (educ ~ fe(yob)&fe(qob) + fe(sob)&fe(qob)) + fe(yob) + fe(sob)))
                                TSLS
────────────────────────────────────────────────────────────────────
Number of obs:             329509  Converged:                   true
dof (model):                    1  dof (residuals):           329446
R²:                         0.114  R² adjusted:                0.114
F-statistic:              92.2266  P-value:                    0.000
F (1st stage, joint):     2.38722  P (1st stage, joint):       0.000
────────────────────────────────────────────────────────────────────
       Estimate  Std. Error   t-stat  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────
educ  0.0928181  0.00966506  9.60347    <1e-21  0.0738748   0.111761
────────────────────────────────────────────────────────────────────

API Reference

Main Functions

Function Description
ols(df, formula; ...) Ordinary Least Squares estimation
iv(method, df, formula; ...) Instrumental Variables estimation
fe(var) Fixed effect term in formula
first_stage(model) Extract first-stage diagnostics from IV model

Model Types

Type Description
OLSEstimator Fitted OLS model
IVEstimator Fitted IV model
FirstStageResult First-stage diagnostics container

IV Estimator Types

Type Description
TSLS Two-Stage Least Squares (k = 1)
LIML Limited Information Maximum Likelihood
Fuller(a) Fuller bias-corrected estimator (default a = 1.0)
KClass(kappa) Generic K-class with custom kappa

StatsAPI Methods

All standard StatsAPI methods work with fitted models:

coef(model)          # Coefficient estimates
stderror(model)      # Standard errors
vcov(model)          # Variance-covariance matrix
confint(model)       # Confidence intervals
coeftable(model)     # Full coefficient table
nobs(model)          # Number of observations
dof(model)           # Degrees of freedom (model)
dof_residual(model)  # Degrees of freedom (residual)
r2(model)            # R-squared
adjr2(model)         # Adjusted R-squared
residuals(model)     # Residual vector
fitted(model)        # Fitted values

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •