This document explains how to reproduce the main results of the paper and how the repository is organized.
Before running anything: install the R and Python dependencies. See INSTALL.md for step-by-step instructions, package lists, and platform notes.
-
All commands for the main paper must be run from the repository root.
-
All generated outputs are written under:
results/
- All plotting is done in the Jupyter Notebook:
python/plots/plotting.ipynb
- All scripts assume you run from the repository root.
- All results are written to structured subfolders inside
results/. - No manual path editing is required.
- All key settings for MH-Within-Gibbs samplers are controlled via environment variables
Figure 1 (Illustration of using the linear program) is only for illustrative purpose and it is not reproduced in this repository.
All figures are generated using the Jupyter notebook python/plots/plotting.ipynb and details like the corresponding figures are in the notebook. Note that the notebook reads in the data and MCMC results, so you should run the corresponding algorithms and store the samples and other necessary results before using the notebook for plotting.
Table 1 (Skip-logic survey example) is not part of the simulation and does not need to be reproduced.
Table 2 (RMSE for estimating make full with custom settings (see "6.Environment Variable Controls" for details)
All figures can be generated using the Jupyter notebook python/plots/plotting.ipynb and the details of correponding figures are clearly stated in the notebook.
Table 1 (RMSE for estimating
Table 2 (RMSE for estimating
Table 3 (RMSE for estimating run kernel_compare.
Entry script:
R/simulations/mh_within_gibbs/Production_Run.R
- Kernel for auxiliary variable ( u ) (in the paper): exponential
- Only partial simulation grid over:
d_list = 2,5m_list = 1,2
- Default iterations (when invoked via
make full):n_iter = 50000n_warmup = 5000
⚠ Direct invocation (
Rscript R/simulations/mh_within_gibbs/Production_Run.Rwithout env vars) uses the smaller in-script defaultsn_iter=5000,n_warmup=2000. The50000/5000numbers above only apply when invoked viascripts/run_full.sh/make full, which exports them.
make fullOutputs are written to:
results/runs/mh_within_gibbs/<RUN_TAG>/
(Default RUN_TAG = full_example when invoked via scripts/run_full.sh; default when invoked directly via Rscript.)
Run smoke test and run full simulation
make probit_smoke
make probitOutputs are written to:
results/runs/probit
The supplementary material compares two kernels for updating the auxiliary variable ( u ):
exponentialhalf_gaussian
To run the comparison:
make kernel_compareThis will:
- Use both kernels
- Run across all
(d, m)combinations (withm < d,d in 5,20,50,200,1000, andm in 1,5,10,50,100) - Use lighter defaults:
n_iter = 10000n_warmup = 5000
Outputs are written under:
results/runs/mh_within_gibbs/kernel_compare/
To verify the entire pipeline works:
make smokeThis runs:
- A lightweight simulation
- A lightweight duck data analysis
Smoke test runs quickly and is intended for sanity checking.
Scripts:
R/data_analysis/duck_matching.R
R/data_analysis/duck_matching_reduced.R
make duck_fullOutputs:
results/runs/data_analysis/duck_full/
make duck_reducedOutputs:
results/runs/data_analysis/duck_reduced/
All scripts support overrides via environment variables. Variables are grouped below by which entry script reads them. Defaults shown are the in-script defaults; the wrapper bash scripts in scripts/ may set different values before invoking R (see §10 and the script source for those overrides).
| Variable | Read by | Default | Description |
|---|---|---|---|
JASA_RUN_TAG |
all entry scripts | varies (default, probit_default, reduced_default) |
Tag used to name the per-run output subfolder under results/runs/<method>/. |
JASA_SEED |
Simulation_probit.R |
123 |
RNG seed (other R scripts hardcode set.seed(123) and do not honor this). |
| Variable | Default | Description |
|---|---|---|
JASA_N_ITER |
5000 |
Total MCMC iterations. |
JASA_N_WARMUP |
2000 |
Warmup steps (used only for the in-script RMSE print and trace/ACF plots; the saved samples include warmup). |
JASA_N_THIN |
25 |
Thinning factor (used only for ACF plots and the in-script print; saved samples are not thinned). |
JASA_N_REP |
1 |
Number of independent simulation repetitions per (d, m) cell. |
JASA_N_HAR |
100 |
Inner-loop iterations for the hit-and-run sampler. |
JASA_D_LIST |
2,5,10,20 |
Comma- or space-separated list of response dimensions |
JASA_M_LIST |
1,2,5,10 |
Comma- or space-separated list of constraint counts |
JASA_P |
5 |
Number of covariates. |
JASA_N |
1000 |
Number of observations. |
JASA_METHODS |
exponential |
Comma- or space-separated list of kernels for exponential and halfgaussian are recognized — other values are silently skipped. scripts/run_full.sh overrides this to "exponential halfgaussian". |
JASA_SAVE |
TRUE |
Save per-case *_beta_samples.rds / *_beta_true.rds. |
JASA_PLOT |
TRUE |
Generate trace and ACF PNGs (warmup and thinning are applied here, unlike the saved samples). |
Note: posterior samples from this sampler are written without discarding warmup or thinning — i.e. the saved beta_samples array has shape
You can also bypass env vars entirely by editing the CONFIGURATION block (lines 78–98) of R/simulations/mh_within_gibbs/Production_Run.R.
| Variable | Default | Description |
|---|---|---|
JASA_DATA_DIR |
data/waterfowl_matching |
Directory containing duck_data.csv, A_tilde_matrix.csv, Z_matrix.csv. |
JASA_N_ITER |
50000 |
Total Gibbs iterations. |
JASA_N_WARMUP |
5000 |
Warmup (recorded in metadata; not enforced on saved samples). |
JASA_N_THIN |
1 |
Thinning (recorded in metadata; not enforced on saved samples). |
JASA_K |
7 |
Number of species groups (full model only; the reduced model fixes |
JASA_KAPPA |
5 |
Number of B-spline basis functions. |
JASA_TAU_A |
0.1 |
|
JASA_TAU_RHO |
0.1 |
|
JASA_SAVE_RHO |
TRUE |
Save the rho_samples array. |
JASA_SAVE_A |
TRUE |
Save the a_samples array. |
JASA_SAVE_ZETA |
FALSE |
Save the zeta_samples array. Off by default: with n_iter=50000, n=18, d=339, this is ≈ 2.4 GB. |
Note: loop_hit_and_run's inner-loop length is hardcoded to 100 in the duck scripts and does not honor JASA_N_HAR.
Note also that there is misalignmenet between the paper and code in the naming of parameters for the duck model: the paper's a and rho in the code, respectively.
6.4 Probit fitted-value-curve (R/simulations/fitted_value_curve/Simulation_probit.R and its sourced sub-scripts)
| Variable | Default | Description |
|---|---|---|
JASA_SEED |
123 |
RNG seed for this experiment. |
JASA_N |
1000 |
Number of observations. |
JASA_P |
2 |
Number of covariates (must equal ncol(X) after intercept). |
JASA_D |
2 |
Response dimension. |
JASA_PLOT |
FALSE |
Save trace PNGs for both unconstrained and constrained (ILP) samplers. |
JASA_PROBIT_N_ITER |
20000 |
MCMC iterations for both sub-samplers. |
JASA_PROBIT_N_HAR |
100 |
Inner-loop iterations for hit-and-run (constrained sampler only). |
JASA_PROBIT_BURN_IN |
5000 |
Burn-in for the in-script posterior mean / RMSE print (saved samples are full-length). |
JASA_PROBIT_THIN |
10 |
Thinning for the in-script print only. |
JASA_PROBIT_LOG_EVERY |
100 (constrained) / 0 (unconstrained, quiet) |
Print iteration progress every N steps. |
Example custom run:
JASA_N_ITER=20000 JASA_RUN_TAG=test_run make fullJASA_METHODS=exponential make fullEntry script:
python/pmmh/run_pmmh.py
Auxiliary scripts:
python/pmmh/pmmh_core.py
python/pmmh/pmmh_diagnostics.py
To run the sampler in custom settings, directly change variable values in step1 and step2 in python/pmmh/run_pmmh.py and run it.
python python/pmmh/run_pmmh.pyEntry script:
python/hmc/batch_run_nuts.ipynb
Auxiliary script:
python/hmc/nuts_ilp_model.py
Plotting notebooks currently reside in:
python/plots/plotting.ipynb
Generated figures are intended to be saved under:
results/figures/
Approximate runtime on a Macbook Pro:
make smoke: less than a minutemake full: one to two weeks (large grid + long chains)make kernel_compare: less than one daymake probit: two minutesmake duck_full/make duck_reduced: half an hour
| Command | Purpose |
|---|---|
make smoke |
Quick pipeline sanity check |
make probit_smoke |
Quick pipeline sanity check for the simulation in section 5.1 |
make full |
Full simulation (main paper) |
make kernel_compare |
Supplementary kernel comparison |
make probit |
Simulation in section 5.1 |
make duck_full |
Full model for waterfowl matching |
make duck_reduced |
Reduced model for waterfowl matching (two-group model) |
make clean |
Remove all generated results |
- No manual editing of file paths is required.
- All outputs are written under
results/. - All model settings for the MH-Within-Gibbs sampler are controlled via environment variables.
- Default settings do not necessarily reproduce main paper results as some might take too long to run and risk crash.