Description
Overview
It would be nice to have a module in stdlib
that provides functions for computing means,variances, medians, ... of vectors, and of rows (columns) of 2D-arrays (at least).
E.g.,
real :: res
real, allocatable :: res1(:)
real :: vector(5), mat(6,5)
...
res = mean(vector)
res1 = mean(mat) !returns the mean of each row of mat
res1 = mean(mat, dim = 2) !returns the mean of each column of mat
The same could be implemented for variance, median, ... So the API of all functions would be (almost) the same.
API
Let 's discuss the API of only mean
for a vector first, and then for an array.
For a vector:
function mean_sp_sp(vector) result(res)
real(sp), intent(in) :: vector(:)
real(sp) :: res
end function
For a 2D array:
function mean_sp_sp(mat, dim) result(res)
real(sp), intent(in) :: mat(:,:)
integer, intent(in), optional :: dim !dim = 1 or dim = 2
real(sp), allocatable :: res(:)
end function
If dim = 1
, it returns the mean of each row (so res(1:size(mat,1))
).
If dim = 2
, it returns the mean of each column (so res(1:size(mat,2))
).
Here (generated manually with fypp
) is an example for mean
in stdlib
.
The same API could be used for variance, median, cumulative sum, geometric mean, ...
Should we support arrays of rank > 2? E.g., what would return mean(mat(:,:,:,:), dim =3)
?
Should we use functions or subroutine (and overload =
)?:
The result of the procedure would be of the same kind as the input, and (implicit) conversion would be performed by the user. Functions could then be used.
Alternatively:
For real arrays, procedures would return a result of the same kind, or of a lower kind, of the argument (e.g., a mean of a dp
array would return the result in sp
or dp
). All computations inside the procedure would be performed in the same kind as the input array, and the result would be converted just before the function returns the result.
For integer arrays, procedures would return a result of a real kind (e.g., a mean of a int64
array would return the result in sp
, dp
, or qp
). All computations inside the procedure would be performed in the same kind as the result.
Implementation
Probably most of us have some implementations. @leonfoks has also an implementation for 1D array on Github.
I would think about a module called stdlib_experimental_stat.f90
and multiple submodules (one per stat, e.g., stdlib_experimental_stat_mean.f90
, that contains all functions related with that stat).
The first PR would contain only one stat, e.g. mean
to facilitate the discussion.
Currently in stdlib
mean (mean
)
variance (var
)
central moment (moment
)
Possible additional functions
standard deviation (std
)
median (median
)
mode (mode
)
Others
covariance (cov
)
correlation (corr
)