Stochastic gradient-boosted decision trees for multivariate classification, usable standalone and via Python interface.
Check the paper on ArXiv: FastBDT: A speed-optimized and cache-friendly implementation of stochastic gradient-boosted decision trees for multivariate classification
Stochastic gradient-boosted decision trees are widely employed for multivariate classification and regression tasks. This paper presents a speed-optimized and cache-friendly implementation for multivariate classification called FastBDT. FastBDT is one order of magnitude faster during the fitting and application phases compared to popular implementations in frameworks like TMVA, scikit-learn, and XGBoost. The concepts used to optimize execution time and performance are discussed in detail in this paper. Key ideas include:
- equal-frequency binning on the input data, which allows replacing expensive floating-point operations with integer operations while improving classification quality;
- a cache-friendly linear access pattern to the input data, in contrast to typical implementations that exhibit random access patterns.
FastBDT provides interfaces to C/C++ and Python. It is extensively used in high energy physics by the Belle II Collaboration.
This repository is a fork maintained by the Belle II Collaboration. It is guaranteed to compile with modern compilers and the unit tests and main examples are fully functional, unless stated otherwise. However, no further development of this fork is currently planned.
The original repository can be found at: https://github.com/thomaskeck/FastBDT
To build and install FastBDT, use the following commands:
mkdir -p build install && cd build
cmake ..
make
make installThis will also install the Python bindings automatically if CMake detects a valid python3 interpreter during the configuration step.
Typically, you will want to use FastBDT as a library integrated directly into your application. Available interfaces:
- the C++ shared/static library (see
examples/IRISExample.cxx) - the C shared library
- the Python library
PyFastBDT/FastBDT.py(seeexamples/iris_example.pyandexamples/generic_example.py)
This work is mostly based on the papers by Jerome H. Friedman
FastBDT also implements the uniform gradient boosting techniques to boost to flatness: