Skip to content

Commit af2e304

Browse files
NielsPraetjvdd
andauthored
✨ feat: nan handling (#59)
* ✨ feat: add nan implementation of m4 algorithm * ✨ feat: add nan implementation of minmax algorithm * ✨ feat: add nan implementation of minmaxlttb algorithm * 💩 feat: update lib script to incorporate nan-handling functions * 🚧 feat: add new nan downsampler * ✅ tests: add new nan functions to rust mod tests * 🎨 chore: format code * ✨ feat: expose new nan downsamplers to api * ✅ tests: update tsdownsample tests to support nan downsamplers * 🎨 chore: format code * ✅ tests: add test for nan downsamplers * ✨ feat: add python counterparts of Rust downsamplers * ✅ tests: re-enable commented out tests * 🎨 chore: format code * 🔥 chore: remove commented code * 📝 docs: update README.md * 📝 docs: update NaN descriptions * 🧹 remove threaded * 🎉 cleanup code * 🙈 fix typo in NaNMinMaxDownsampler * 🕵️ benchmark NaN downsamplers * 🧹 * 🧹 * 🧹 limit duplicate code * 🙈 fix linting * 🧹 --------- Co-authored-by: Jeroen Van Der Donckt <[email protected]> Co-authored-by: jvdd <[email protected]>
1 parent e4d4a66 commit af2e304

14 files changed

+842
-199
lines changed

README.md

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,14 @@
66
[![CodeQL](https://github.com/predict-idlab/tsdownsample/actions/workflows/codeql.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/codeql.yml)
77
[![Testing](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-downsample_rs.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-downsample_rs.yml)
88
[![Testing](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-tsdownsample.yml/badge.svg)](https://github.com/predict-idlab/tsdownsample/actions/workflows/ci-tsdownsample.yml)
9+
910
<!-- TODO: codecov -->
1011

1112
Extremely fast **time series downsampling 📈** for visualization, written in Rust.
1213

1314
## Features ✨
1415

15-
* **Fast**: written in rust with PyO3 bindings
16+
- **Fast**: written in rust with PyO3 bindings
1617
- leverages optimized [argminmax](https://github.com/jvdd/argminmax) - which is SIMD accelerated with runtime feature detection
1718
- scales linearly with the number of data points
1819
<!-- TODO check if it scales sublinearly -->
@@ -25,21 +26,21 @@ Extremely fast **time series downsampling 📈** for visualization, written in R
2526
</blockquote>
2627
In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with <a href="https://github.com/rayon-rs/rayon">Rayon</a>) with little to no overhead.
2728
</details>
28-
* **Efficient**: memory efficient
29+
- **Efficient**: memory efficient
2930
- works on views of the data (no copies)
3031
- no intermediate data structures are created
31-
* **Flexible**: works on any type of data
32-
- supported datatypes are
33-
- for `x`: `f32`, `f64`, `i16`, `i32`, `i64`, `u16`, `u32`, `u64`, `datetime64`, `timedelta64`
34-
- for `y`: `f16`, `f32`, `f64`, `i8`, `i16`, `i32`, `i64`, `u8`, `u16`, `u32`, `u64`, `datetime64`, `timedelta64`, `bool`
32+
- **Flexible**: works on any type of data
33+
- supported datatypes are
34+
- for `x`: `f32`, `f64`, `i16`, `i32`, `i64`, `u16`, `u32`, `u64`, `datetime64`, `timedelta64`
35+
- for `y`: `f16`, `f32`, `f64`, `i8`, `i16`, `i32`, `i64`, `u8`, `u16`, `u32`, `u64`, `datetime64`, `timedelta64`, `bool`
3536
<details>
3637
<summary><i>!! 🚀 <code>f16</code> <a href="https://github.com/jvdd/argminmax">argminmax</a> is 200-300x faster than numpy</i></summary>
3738
In contrast with all other data types above, <code>f16</code> is *not* hardware supported (i.e., no instructions for f16) by most modern CPUs!! <br>
3839
🐌 Programming languages facilitate support for this datatype by either (i) upcasting to <u>f32</u> or (ii) using a software implementation. <br>
3940
💡 As for argminmax, only comparisons are needed - and thus no arithmetic operations - creating a <u>symmetrical ordinal mapping from <code>f16</code> to <code>i16</code></u> is sufficient. This mapping allows to use the hardware supported scalar and SIMD <code>i16</code> instructions - while not producing any memory overhead 🎉 <br>
4041
<i>More details are described in <a href="https://github.com/jvdd/argminmax/pull/1">argminmax PR #1</a>.</i>
4142
</details>
42-
* **Easy to use**: simple & flexible API
43+
- **Easy to use**: simple & flexible API
4344

4445
## Install
4546

@@ -83,6 +84,7 @@ downsample([x], y, n_out, **kwargs) -> ndarray[uint64]
8384
```
8485

8586
**Arguments**:
87+
8688
- `x` is optional
8789
- `x` and `y` are both positional arguments
8890
- `n_out` is a mandatory keyword argument that defines the number of output values<sup>*</sup>
@@ -93,7 +95,8 @@ downsample([x], y, n_out, **kwargs) -> ndarray[uint64]
9395

9496
**Returns**: a `ndarray[uint64]` of indices that can be used to index the original data.
9597

96-
<sup>*</sup><i>When there are gaps in the time series, fewer than `n_out` indices may be returned.</i>
98+
<sup>\*</sup><i>When there are gaps in the time series, fewer than `n_out` indices may be returned.</i>
99+
97100
### Downsampling algorithms 📈
98101

99102
The following downsampling algorithms (classes) are implemented:
@@ -107,12 +110,28 @@ The following downsampling algorithms (classes) are implemented:
107110

108111
<sup>*</sup><i>Default value for `minmax_ratio` is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332</i>
109112

113+
### Handling NaNs
114+
115+
This library supports two `NaN`-policies:
116+
117+
1. Omit `NaN`s (`NaN`s are ignored during downsampling).
118+
2. Return index of first `NaN` once there is at least one present in the bin of the considered data.
119+
120+
| Omit `NaN`s | Return `NaN`s |
121+
| ----------------------: | :------------------------- |
122+
| `MinMaxDownsampler` | `NaNMinMaxDownsampler` |
123+
| `M4Downsampler` | `NaNM4Downsampler` |
124+
| `MinMaxLTTBDownsampler` | `NaNMinMaxLTTBDownsampler` |
125+
| `LTTBDownsampler` | |
126+
127+
> Note that NaNs are not supported for `x`-data.
110128
111129
## Limitations & assumptions 🚨
112130

113131
Assumes;
132+
114133
1. `x`-data is (non-strictly) monotonic increasing (i.e., sorted)
115-
2. no `NaNs` in the data
134+
2. no `NaN`s in `x`-data
116135

117136
---
118137

downsample_rs/src/m4.rs

Lines changed: 61 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
use argminmax::ArgMinMax;
1+
use argminmax::{ArgMinMax, NaNArgMinMax};
22
use num_traits::{AsPrimitive, FromPrimitive};
33
use rayon::iter::IndexedParallelIterator;
44
use rayon::prelude::*;
@@ -13,55 +13,82 @@ use super::POOL;
1313

1414
// ----------- WITH X
1515

16-
pub fn m4_with_x<Tx, Ty>(x: &[Tx], arr: &[Ty], n_out: usize) -> Vec<usize>
17-
where
18-
for<'a> &'a [Ty]: ArgMinMax,
19-
Tx: Num + FromPrimitive + AsPrimitive<f64>,
20-
Ty: Copy + PartialOrd,
21-
{
22-
assert_eq!(n_out % 4, 0);
23-
let bin_idx_iterator = get_equidistant_bin_idx_iterator(x, n_out / 4);
24-
m4_generic_with_x(arr, bin_idx_iterator, n_out, |arr| arr.argminmax())
16+
macro_rules! m4_with_x {
17+
($func_name:ident, $trait:path, $f_argminmax:expr) => {
18+
pub fn $func_name<Tx, Ty>(x: &[Tx], arr: &[Ty], n_out: usize) -> Vec<usize>
19+
where
20+
for<'a> &'a [Ty]: $trait,
21+
Tx: Num + FromPrimitive + AsPrimitive<f64>,
22+
Ty: Copy + PartialOrd,
23+
{
24+
assert_eq!(n_out % 4, 0);
25+
let bin_idx_iterator = get_equidistant_bin_idx_iterator(x, n_out / 4);
26+
m4_generic_with_x(arr, bin_idx_iterator, n_out, $f_argminmax)
27+
}
28+
};
2529
}
2630

31+
m4_with_x!(m4_with_x, ArgMinMax, |arr| arr.argminmax());
32+
m4_with_x!(m4_with_x_nan, NaNArgMinMax, |arr| arr.nanargminmax());
33+
2734
// ----------- WITHOUT X
2835

29-
pub fn m4_without_x<T: Copy + PartialOrd>(arr: &[T], n_out: usize) -> Vec<usize>
30-
where
31-
for<'a> &'a [T]: ArgMinMax,
32-
{
33-
assert_eq!(n_out % 4, 0);
34-
m4_generic(arr, n_out, |arr| arr.argminmax())
36+
macro_rules! m4_without_x {
37+
($func_name:ident, $trait:path, $f_argminmax:expr) => {
38+
pub fn $func_name<T: Copy + PartialOrd>(arr: &[T], n_out: usize) -> Vec<usize>
39+
where
40+
for<'a> &'a [T]: $trait,
41+
{
42+
assert_eq!(n_out % 4, 0);
43+
m4_generic(arr, n_out, $f_argminmax)
44+
}
45+
};
3546
}
3647

48+
m4_without_x!(m4_without_x, ArgMinMax, |arr| arr.argminmax());
49+
m4_without_x!(m4_without_x_nan, NaNArgMinMax, |arr| arr.nanargminmax());
50+
3751
// ------------------------------------- PARALLEL --------------------------------------
3852

3953
// ----------- WITH X
4054

41-
pub fn m4_with_x_parallel<Tx, Ty>(x: &[Tx], arr: &[Ty], n_out: usize) -> Vec<usize>
42-
where
43-
for<'a> &'a [Ty]: ArgMinMax,
44-
Tx: Num + FromPrimitive + AsPrimitive<f64> + Send + Sync,
45-
Ty: Copy + PartialOrd + Send + Sync,
46-
{
47-
assert_eq!(n_out % 4, 0);
48-
let bin_idx_iterator = get_equidistant_bin_idx_iterator_parallel(x, n_out / 4);
49-
m4_generic_with_x_parallel(arr, bin_idx_iterator, n_out, |arr| arr.argminmax())
55+
macro_rules! m4_with_x_parallel {
56+
($func_name:ident, $trait:path, $f_argminmax:expr) => {
57+
pub fn $func_name<Tx, Ty>(x: &[Tx], arr: &[Ty], n_out: usize) -> Vec<usize>
58+
where
59+
for<'a> &'a [Ty]: $trait,
60+
Tx: Num + FromPrimitive + AsPrimitive<f64> + Send + Sync,
61+
Ty: Copy + PartialOrd + Send + Sync,
62+
{
63+
assert_eq!(n_out % 4, 0);
64+
let bin_idx_iterator = get_equidistant_bin_idx_iterator_parallel(x, n_out / 4);
65+
m4_generic_with_x_parallel(arr, bin_idx_iterator, n_out, $f_argminmax)
66+
}
67+
};
5068
}
5169

70+
m4_with_x_parallel!(m4_with_x_parallel, ArgMinMax, |arr| arr.argminmax());
71+
m4_with_x_parallel!(m4_with_x_parallel_nan, NaNArgMinMax, |arr| arr
72+
.nanargminmax());
73+
5274
// ----------- WITHOUT X
5375

54-
pub fn m4_without_x_parallel<T: Copy + PartialOrd + Send + Sync>(
55-
arr: &[T],
56-
n_out: usize,
57-
) -> Vec<usize>
58-
where
59-
for<'a> &'a [T]: ArgMinMax,
60-
{
61-
assert_eq!(n_out % 4, 0);
62-
m4_generic_parallel(arr, n_out, |arr| arr.argminmax())
76+
macro_rules! m4_without_x_parallel {
77+
($func_name:ident, $trait:path, $f_argminmax:expr) => {
78+
pub fn $func_name<T: Copy + PartialOrd + Send + Sync>(arr: &[T], n_out: usize) -> Vec<usize>
79+
where
80+
for<'a> &'a [T]: $trait,
81+
{
82+
assert_eq!(n_out % 4, 0);
83+
m4_generic_parallel(arr, n_out, $f_argminmax)
84+
}
85+
};
6386
}
6487

88+
m4_without_x_parallel!(m4_without_x_parallel, ArgMinMax, |arr| arr.argminmax());
89+
m4_without_x_parallel!(m4_without_x_parallel_nan, NaNArgMinMax, |arr| arr
90+
.nanargminmax());
91+
6592
// TODO: check for duplicate data in the output array
6693
// -> In the current implementation we always add 4 datapoints per bin (if of
6794
// course the bin has >= 4 datapoints). However, the argmin and argmax might

downsample_rs/src/minmax.rs

Lines changed: 62 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
use rayon::iter::IndexedParallelIterator;
22
use rayon::prelude::*;
33

4-
use argminmax::ArgMinMax;
4+
use argminmax::{ArgMinMax, NaNArgMinMax};
55
use num_traits::{AsPrimitive, FromPrimitive};
66

77
use super::searchsorted::{
@@ -14,55 +14,83 @@ use super::POOL;
1414

1515
// ----------- WITH X
1616

17-
pub fn min_max_with_x<Tx, Ty>(x: &[Tx], arr: &[Ty], n_out: usize) -> Vec<usize>
18-
where
19-
for<'a> &'a [Ty]: ArgMinMax,
20-
Tx: Num + FromPrimitive + AsPrimitive<f64>,
21-
Ty: Copy + PartialOrd,
22-
{
23-
assert_eq!(n_out % 2, 0);
24-
let bin_idx_iterator = get_equidistant_bin_idx_iterator(x, n_out / 2);
25-
min_max_generic_with_x(arr, bin_idx_iterator, n_out, |arr| arr.argminmax())
17+
macro_rules! min_max_with_x {
18+
($func_name:ident, $trait:path, $f_argminmax:expr) => {
19+
pub fn $func_name<Tx, Ty>(x: &[Tx], arr: &[Ty], n_out: usize) -> Vec<usize>
20+
where
21+
for<'a> &'a [Ty]: $trait,
22+
Tx: Num + FromPrimitive + AsPrimitive<f64>,
23+
Ty: Copy + PartialOrd,
24+
{
25+
assert_eq!(n_out % 2, 0);
26+
let bin_idx_iterator = get_equidistant_bin_idx_iterator(x, n_out / 2);
27+
min_max_generic_with_x(arr, bin_idx_iterator, n_out, $f_argminmax)
28+
}
29+
};
2630
}
2731

32+
min_max_with_x!(min_max_with_x, ArgMinMax, |arr| arr.argminmax());
33+
min_max_with_x!(min_max_with_x_nan, NaNArgMinMax, |arr| arr.nanargminmax());
34+
2835
// ----------- WITHOUT X
2936

30-
pub fn min_max_without_x<T: Copy + PartialOrd>(arr: &[T], n_out: usize) -> Vec<usize>
31-
where
32-
for<'a> &'a [T]: ArgMinMax,
33-
{
34-
assert_eq!(n_out % 2, 0);
35-
min_max_generic(arr, n_out, |arr| arr.argminmax())
37+
macro_rules! min_max_without_x {
38+
($func_name:ident, $trait:path, $f_argminmax:expr) => {
39+
pub fn $func_name<T: Copy + PartialOrd>(arr: &[T], n_out: usize) -> Vec<usize>
40+
where
41+
for<'a> &'a [T]: $trait,
42+
{
43+
assert_eq!(n_out % 2, 0);
44+
min_max_generic(arr, n_out, $f_argminmax)
45+
}
46+
};
3647
}
3748

49+
min_max_without_x!(min_max_without_x, ArgMinMax, |arr| arr.argminmax());
50+
min_max_without_x!(min_max_without_x_nan, NaNArgMinMax, |arr| arr
51+
.nanargminmax());
52+
3853
// ------------------------------------- PARALLEL --------------------------------------
3954

4055
// ----------- WITH X
4156

42-
pub fn min_max_with_x_parallel<Tx, Ty>(x: &[Tx], arr: &[Ty], n_out: usize) -> Vec<usize>
43-
where
44-
for<'a> &'a [Ty]: ArgMinMax,
45-
Tx: Num + FromPrimitive + AsPrimitive<f64> + Send + Sync,
46-
Ty: Copy + PartialOrd + Send + Sync,
47-
{
48-
assert_eq!(n_out % 2, 0);
49-
let bin_idx_iterator = get_equidistant_bin_idx_iterator_parallel(x, n_out / 2);
50-
min_max_generic_with_x_parallel(arr, bin_idx_iterator, n_out, |arr| arr.argminmax())
57+
macro_rules! min_max_with_x_parallel {
58+
($func_name:ident, $trait:path, $f_argminmax:expr) => {
59+
pub fn $func_name<Tx, Ty>(x: &[Tx], arr: &[Ty], n_out: usize) -> Vec<usize>
60+
where
61+
for<'a> &'a [Ty]: $trait,
62+
Tx: Num + FromPrimitive + AsPrimitive<f64> + Send + Sync,
63+
Ty: Copy + PartialOrd + Send + Sync,
64+
{
65+
assert_eq!(n_out % 2, 0);
66+
let bin_idx_iterator = get_equidistant_bin_idx_iterator_parallel(x, n_out / 2);
67+
min_max_generic_with_x_parallel(arr, bin_idx_iterator, n_out, $f_argminmax)
68+
}
69+
};
5170
}
5271

72+
min_max_with_x_parallel!(min_max_with_x_parallel, ArgMinMax, |arr| arr.argminmax());
73+
min_max_with_x_parallel!(min_max_with_x_parallel_nan, NaNArgMinMax, |arr| arr
74+
.nanargminmax());
75+
5376
// ----------- WITHOUT X
5477

55-
pub fn min_max_without_x_parallel<T: Copy + PartialOrd + Send + Sync>(
56-
arr: &[T],
57-
n_out: usize,
58-
) -> Vec<usize>
59-
where
60-
for<'a> &'a [T]: ArgMinMax,
61-
{
62-
assert_eq!(n_out % 2, 0);
63-
min_max_generic_parallel(arr, n_out, |arr| arr.argminmax())
78+
macro_rules! min_max_without_x_parallel {
79+
($func_name:ident, $trait:path, $f_argminmax:expr) => {
80+
pub fn $func_name<T: Copy + PartialOrd + Send + Sync>(arr: &[T], n_out: usize) -> Vec<usize>
81+
where
82+
for<'a> &'a [T]: $trait,
83+
{
84+
assert_eq!(n_out % 2, 0);
85+
min_max_generic_parallel(arr, n_out, $f_argminmax)
86+
}
87+
};
6488
}
6589

90+
min_max_without_x_parallel!(min_max_without_x_parallel, ArgMinMax, |arr| arr.argminmax());
91+
min_max_without_x_parallel!(min_max_without_x_parallel_nan, NaNArgMinMax, |arr| arr
92+
.nanargminmax());
93+
6694
// ----------------------------------- GENERICS ------------------------------------
6795

6896
// --------------------- WITHOUT X

0 commit comments

Comments
 (0)