Proposal for descriptive statistics

## Overview

It would be nice to have a module in `stdlib` that provides functions for computing means,variances, medians, ... of vectors, and of rows (columns) of 2D-arrays (at least).
E.g.,
```fortran
real :: res
real, allocatable :: res1(:)
real :: vector(5), mat(6,5)
...
res = mean(vector)
res1 = mean(mat) !returns the mean of each row of mat
res1 = mean(mat, dim = 2) !returns the mean of each column of mat
```  

The same could be implemented for variance, median, ... So the API of all functions would be (almost) the same.

## API
Let 's discuss the API of only `mean` for a vector first, and then for an array.

For a vector:

```fortran
function mean_sp_sp(vector) result(res)
    real(sp), intent(in) :: vector(:)
    real(sp) :: res
end function
```

For a 2D array:

```fortran
function mean_sp_sp(mat, dim) result(res)
    real(sp), intent(in) :: mat(:,:)
    integer, intent(in), optional :: dim !dim = 1 or dim = 2 
    real(sp), allocatable :: res(:)
end function
```
If `dim = 1`, it returns the mean of each row (so `res(1:size(mat,1))`).
If `dim = 2`, it returns the mean of each column (so `res(1:size(mat,2))`).

[Here](https://github.com/jvdp1/stdlib/blob/stat_dev/src/stdlib_experimental_stat.f90) (generated manually with `fypp`) is an example for `mean` in `stdlib`.

The same API could be used for variance, median, cumulative sum, geometric mean, ...

Should we support arrays of rank > 2? E.g., what would return `mean(mat(:,:,:,:), dim =3)`?

### Should we use functions or subroutine (and overload `=`)?:
The result of the procedure would be of the same kind as the input, and (implicit) conversion would be performed by the user. Functions could then be used.

Alternatively:
For real arrays, procedures would return a result of the same kind, or of a lower kind, of the argument (e.g., a mean of a `dp` array would return the result in `sp` or `dp`). All computations inside the procedure would be performed in the same kind as the input array, and the result would be converted just before the function returns the result.
For integer arrays, procedures would return a result of a real kind (e.g., a mean of a `int64` array would return the result in `sp`, `dp`, or `qp`). All computations inside the procedure would be performed in the same kind as  the result.


## Implementation
Probably most of us have some implementations. @leonfoks has also an implementation for 1D array on Github.
I would think about a module called `stdlib_experimental_stat.f90` and multiple submodules (one per stat, e.g., `stdlib_experimental_stat_mean.f90`, that contains all functions related with that stat).
The first PR would contain only one stat, e.g. `mean` to facilitate the discussion.

### Currently in `stdlib`
mean (`mean`)
variance (`var`)
central moment (`moment`)

### Possible additional functions
standard deviation (`std`)
median (`median`)
mode (`mode`)

### Others
covariance (`cov`)
correlation (`corr`)


## Other languages
[Matlab](https://in.mathworks.com/help/matlab/ref/mean.html)
[Numpy](https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html)
[Octave](https://octave.org/doc/v4.0.1/Descriptive-Statistics.html)
[R](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/mean)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal for descriptive statistics #113

Overview

API

Should we use functions or subroutine (and overload `=`)?:

Implementation

Currently in `stdlib`

Possible additional functions

Others

Other languages

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal for descriptive statistics #113

Description

Overview

API

Should we use functions or subroutine (and overload =)?:

Implementation

Currently in stdlib

Possible additional functions

Others

Other languages

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Should we use functions or subroutine (and overload `=`)?:

Currently in `stdlib`