treeshap-rs

Production-grade TreeSHAP implementation in Rust. Computes exact Shapley values for XGBoost, LightGBM, and ONNX tree ensemble models using the tree-path-dependent algorithm (Lundberg et al., 2020).

Features

Exact SHAP values via tree-path-dependent and interventional modes
Sub-millisecond latency for single-sample inference (265 us, 100 trees, depth 6)
Format-independent --- one engine handles XGBoost, LightGBM, and ONNX models
Parallel by default --- adaptive Rayon parallelism over samples or trees
Built-in visualization --- waterfall, beeswarm, and feature importance plots (SVG)
Python bindings via PyO3 with zero-copy NumPy interop

Quick Start

CLI

treeshap explain --model model.json --format xgboost --data samples.csv --output shap.json
treeshap verify --explanation shap.json
treeshap plot waterfall --explanation shap.json --output waterfall.svg

Rust

use treeshap_io::parse_xgboost_json_from_path;
use treeshap_core::Explainer;

let ensemble = parse_xgboost_json_from_path("model.json")?;
let explainer = Explainer::new(&ensemble);
let explanation = explainer.explain(data.view());
assert!(explanation.verify().is_pass());

Python

from treeshap import TreeEnsemble, ShapExplainer

model = TreeEnsemble.from_file("model.json", "xgboost")
explainer = ShapExplainer(model)
explanation = explainer.explain(X)
print(explanation.shap_values)       # numpy array
svg = explanation.plot_waterfall()    # SVG bytes

Performance

Measured on Apple M3 (ARM64, 8 cores), release profile with lto = "thin". Full methodology and external comparisons in docs/benchmarks.md.

Configuration	Total	Per-sample
1 sample, 100 trees, depth 6	265 us	265 us
100 samples, 100 trees, depth 6	23 ms	0.23 ms
1,000 samples, 100 trees, depth 6	153 ms	0.15 ms
10,000 samples, 100 trees, depth 6	2.8 s	0.28 ms

vs. existing solutions (10k samples, depth 6)

Implementation	Trees	Total	Per-sample-per-tree
treeshap-rs	100	2.8 s	2.8 us
Python SHAP + XGBoost	1,000	13.0 s	1.3 us
Python SHAP + LightGBM	1,000	42.8 s	4.3 us

treeshap-rs is 1.6x faster than Python SHAP + LightGBM per-tree, with no Python runtime, no GIL, and deterministic memory usage. XGBoost's native C++ implementation remains ~2x faster per-tree due to deep integration with its internal tree format.

Supported Formats

Format	Parser	Versions	Notes
XGBoost JSON	`parse_xgboost_json`	1.0 -- 2.x	Automatic base_score logit handling for v1.6+
LightGBM text	`parse_lightgbm_text`	3.x+	Numerical splits only; categorical support planned
ONNX ML	`parse_onnx`	TreeEnsembleClassifier / Regressor	post_transform detection

Architecture

treeshap-cli        Application layer (CLI binary)
  |-- treeshap-core   SHAP engine, tree IR, validation
  |-- treeshap-io     Model parsers (XGBoost, LightGBM, ONNX)
  |-- treeshap-viz    SVG visualization (waterfall, beeswarm, importance)
treeshap-py         Python bindings (PyO3 + maturin)

Each library crate is independently publishable. treeshap-io and treeshap-viz depend on treeshap-core but never on each other. See docs/architecture.md for design rationale.

Examples

cargo run --example xgboost_json          # XGBoost regression with plots
cargo run --example lgbm_regression       # LightGBM regression
cargo run --example binary_classification # Binary classification (log-odds)
cargo run --example missing_values        # NaN routing demonstration
cargo run --example linfa_native          # Programmatic ensemble construction
cargo run --example onnx                  # ONNX model (in-memory protobuf)

Building

cargo build --workspace --release     # Build all crates
cargo test --workspace                # Run all tests
cargo bench --bench shap_bench        # Criterion benchmarks
cargo clippy --workspace -- -D warnings  # Lint

# Python bindings
cd treeshap-py && pip install maturin && maturin develop --release

Known Limitations

Categorical splits are rejected with UnsupportedSplitType (planned for a future release).
ONNX feature count is inferred from split indices; may undercount for models using a feature subset.
Interventional mode is functional but not yet validated against Python SHAP golden files.
Python bindings require maturin for building.

License

Dual-licensed under MIT and Apache 2.0. Choose whichever you prefer.

References

Lundberg, S.M., Erion, G., Chen, H. et al. "From local explanations to global understanding with explainable AI for trees." Nature Machine Intelligence 2, 56--67 (2020).
Lundberg, S.M. & Lee, S.I. "A Unified Approach to Interpreting Model Predictions." NeurIPS (2017).
Yang, J. "Fast TreeSHAP: Accelerating SHAP Value Computation for Trees." arXiv:2109.09847 (2021).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
fixtures		fixtures
tools		tools
treeshap-cli		treeshap-cli
treeshap-core		treeshap-core
treeshap-io		treeshap-io
treeshap-py		treeshap-py
treeshap-viz		treeshap-viz
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

treeshap-rs

Features

Quick Start

CLI

Rust

Python

Performance

vs. existing solutions (10k samples, depth 6)

Supported Formats

Architecture

Examples

Building

Known Limitations

License

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

treeshap-rs

Features

Quick Start

CLI

Rust

Python

Performance

vs. existing solutions (10k samples, depth 6)

Supported Formats

Architecture

Examples

Building

Known Limitations

License

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages