Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces JSWIN (Jensen-Shannon Windowing), a new univariate drift detection method that utilizes the Jensen-Shannon divergence to identify distributional changes in data streams. JSWIN is designed to detect both gradual and abrupt drifts by comparing empirical distributions within a sliding window. This implementation was developed as part of a project at Warsaw University of Technology. We hope this contribution will enrich the
riverlibrary’s drift detection capabilities.JSWIN (see Algorithm 1) maintains a sliding window Ψ of size n over the univariate data stream.
For every new observation, the window is split into two equal parts P and Q. The empirical
distributions of both halves are computed using fixed-size binning. The empirical Jensen-Shannon
divergence between these distributions is calculated, and if it exceeds a threshold α, a drift is
signaled.
To benchmark JSWIN’s performance, we conducted an experiment using the Adaptive Random Forest (ARF) from the
riverlibrary with default parameters. We compared JSWIN against two other popular drift detectors (KSWIN and ADWIN) using the following configurations:The table below summarizes the average accuracy results of ARF models on various datasets. The Hyperplane dataset is sourced from River’s built-in datasets, while Label Shift and Gaussian are synthetic datasets developed by us. Electricity and Airlines are real-world datasets. Overall, JSWIN performs comparably well to other state-of-the-art drift detection methods. Notably, on the Airlines dataset, which represents real-world data, JSWIN outperformed the other methods.