Source code for the paper Sparse inverse Cholesky factorization of dense kernel matrices by greedy conditional selection.
Install dependencies from environment.yml with conda
or mamba:
conda env create --prefix ./.venv --file environment.ymlor from an explicit spec file (platform must match):
conda create --prefix ./.venv --file linux-64-explicit-spec-list.txt
conda activate ./.venv
pip install build setuptoolsSee managing environments for more information.
Activate conda environment:
conda activate ./.venvBuild Cython extensions:
python setup.py build_ext --inplaceWe rely on the Intel oneMKL library to provide fast numerical routines.
Make sure that numpy and scipy also use the
MKL for BLAS and LAPACK by checking the output of
python -c "import numpy; numpy.__config__.show()"which should show something like
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['.../.venv/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['.../.venv/include']
...
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['.../.venv/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['.../.venv/include']
...
and similarly for
python -c "import scipy; scipy.__config__.show()"conda install numpy from the defaults or anaconda channel (not
conda-forge) should work, but it sometimes doesn't play well with
installing mkl-devel. It's easiest just to use the intel channel.
We use datasets from the SuiteSparse Matrix Collection, the UCI Machine Learning Repository, LIBSVM, and the book Gaussian Processes for Machine Learning. Download the datasets with the provided fish script:
chmod +x get_datasets
./get_datasetsNavigate to the
OCO-2 solar
induced fluorescence (SIF) dataset. Note that the (current) latest version of
the dataset is 11r, but this might change in the future. If the above link
doesn't work, be sure to directly search
for the OCO2_L2_Lite_SIF dataset.
Click on the "Online Archive" blue button on right and then on the 2017 folder. Each file is a different day.
Note that in order to download files, an Earthdata account must be created.
First install R and NetCDF using your preferred package manger.
In order to install R packages locally, follow the instructions
here
to create the default R_LIBS_USER.
mkdir -p ~/R/x86_64-pc-linux-gnu-library/4.2/Be sure to replace x86_64-pc-linux-gnu and 4.2 with your
specific platform and R version, respectively. Running the
command R --version should show you something like the below.
R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Next, start R and enter the following
commands into the REPL to install the packages.
> install.packages("renv", repos = "https://cloud.r-project.org")
> renv::restore()The data can now be compiled with
R --file=compile_fluorescence_data.RThe compile_fluorescence_data.R script is due to
Joe Guinness.
Files can be run as modules:
python -m experiments.cholesky
python -m figures.factor
python -m tests.cknn_tests@article{huan2025sparse,
title = {Sparse {{Inverse Cholesky Factorization}} of {{Dense Kernel Matrices}} by {{Greedy Conditional Selection}}},
author = {Huan, Stephen and Guinness, Joseph and Katzfuss, Matthias and Owhadi, Houman and Sch{\"a}fer, Florian},
year = {2025},
month = sep,
journal = {SIAM/ASA Journal on Uncertainty Quantification},
volume = {13},
number = {3},
pages = {1649--1679},
publisher = {{Society for Industrial and Applied Mathematics}},
doi = {10.1137/23M1606253}
}