Skip to content

Saswatsusmoy/treeshap-rs

Repository files navigation

treeshap-rs

CI PyPI Python Downloads License

Production-grade TreeSHAP implementation in Rust. Computes exact Shapley values for XGBoost, LightGBM, and ONNX tree ensemble models using the tree-path-dependent algorithm (Lundberg et al., 2020).

Features

  • Exact SHAP values via tree-path-dependent and interventional modes
  • Sub-millisecond latency for single-sample inference (265 us, 100 trees, depth 6)
  • Format-independent --- one engine handles XGBoost, LightGBM, and ONNX models
  • Parallel by default --- adaptive Rayon parallelism over samples or trees
  • Built-in visualization --- waterfall, beeswarm, and feature importance plots (SVG)
  • Python bindings via PyO3 with zero-copy NumPy interop

Quick Start

CLI

treeshap explain --model model.json --format xgboost --data samples.csv --output shap.json
treeshap verify --explanation shap.json
treeshap plot waterfall --explanation shap.json --output waterfall.svg

Rust

use treeshap_io::parse_xgboost_json_from_path;
use treeshap_core::Explainer;

let ensemble = parse_xgboost_json_from_path("model.json")?;
let explainer = Explainer::new(&ensemble);
let explanation = explainer.explain(data.view());
assert!(explanation.verify().is_pass());

Python

from treeshap import TreeEnsemble, ShapExplainer

model = TreeEnsemble.from_file("model.json", "xgboost")
explainer = ShapExplainer(model)
explanation = explainer.explain(X)
print(explanation.shap_values)       # numpy array
svg = explanation.plot_waterfall()    # SVG bytes

Performance

Measured on Apple M3 (ARM64, 8 cores), release profile with lto = "thin". Full methodology and external comparisons in docs/benchmarks.md.

Configuration Total Per-sample
1 sample, 100 trees, depth 6 265 us 265 us
100 samples, 100 trees, depth 6 23 ms 0.23 ms
1,000 samples, 100 trees, depth 6 153 ms 0.15 ms
10,000 samples, 100 trees, depth 6 2.8 s 0.28 ms

vs. existing solutions (10k samples, depth 6)

Implementation Trees Total Per-sample-per-tree
treeshap-rs 100 2.8 s 2.8 us
Python SHAP + XGBoost 1,000 13.0 s 1.3 us
Python SHAP + LightGBM 1,000 42.8 s 4.3 us

treeshap-rs is 1.6x faster than Python SHAP + LightGBM per-tree, with no Python runtime, no GIL, and deterministic memory usage. XGBoost's native C++ implementation remains ~2x faster per-tree due to deep integration with its internal tree format.

Supported Formats

Format Parser Versions Notes
XGBoost JSON parse_xgboost_json 1.0 -- 2.x Automatic base_score logit handling for v1.6+
LightGBM text parse_lightgbm_text 3.x+ Numerical splits only; categorical support planned
ONNX ML parse_onnx TreeEnsembleClassifier / Regressor post_transform detection

Architecture

treeshap-cli        Application layer (CLI binary)
  |-- treeshap-core   SHAP engine, tree IR, validation
  |-- treeshap-io     Model parsers (XGBoost, LightGBM, ONNX)
  |-- treeshap-viz    SVG visualization (waterfall, beeswarm, importance)
treeshap-py         Python bindings (PyO3 + maturin)

Each library crate is independently publishable. treeshap-io and treeshap-viz depend on treeshap-core but never on each other. See docs/architecture.md for design rationale.

Examples

cargo run --example xgboost_json          # XGBoost regression with plots
cargo run --example lgbm_regression       # LightGBM regression
cargo run --example binary_classification # Binary classification (log-odds)
cargo run --example missing_values        # NaN routing demonstration
cargo run --example linfa_native          # Programmatic ensemble construction
cargo run --example onnx                  # ONNX model (in-memory protobuf)

Building

cargo build --workspace --release     # Build all crates
cargo test --workspace                # Run all tests
cargo bench --bench shap_bench        # Criterion benchmarks
cargo clippy --workspace -- -D warnings  # Lint

# Python bindings
cd treeshap-py && pip install maturin && maturin develop --release

Known Limitations

  • Categorical splits are rejected with UnsupportedSplitType (planned for a future release).
  • ONNX feature count is inferred from split indices; may undercount for models using a feature subset.
  • Interventional mode is functional but not yet validated against Python SHAP golden files.
  • Python bindings require maturin for building.

License

Dual-licensed under MIT and Apache 2.0. Choose whichever you prefer.

References

  1. Lundberg, S.M., Erion, G., Chen, H. et al. "From local explanations to global understanding with explainable AI for trees." Nature Machine Intelligence 2, 56--67 (2020).
  2. Lundberg, S.M. & Lee, S.I. "A Unified Approach to Interpreting Model Predictions." NeurIPS (2017).
  3. Yang, J. "Fast TreeSHAP: Accelerating SHAP Value Computation for Trees." arXiv:2109.09847 (2021).

About

Exact TreeSHAP in Rust — fast Shapley values for XGBoost, LightGBM, and ONNX tree ensembles

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors