Skip to content

Commit e5ebbd6

Browse files
committed
Updated README and docs with references.
1 parent dd9a81c commit e5ebbd6

11 files changed

Lines changed: 169 additions & 33 deletions

R/two-sample-diagram-test.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@
5959
#' the null distribution (if `keep_null_distribution` is set to `TRUE`) and
6060
#' the list of sampled permutations (if `keep_permutations` is set to `TRUE`).
6161
#'
62+
#' @references Lovato, I., Pini, A., Stamm, A., & Vantini, S. (2020). Model-free two-sample test for network-valued data. Computational Statistics & Data Analysis, 144, 106896.
63+
#'
6264
#' @export
6365
#' @examples
6466
#' two_sample_diagram_test(trefoils1[1:5], trefoils2[1:5], B = 100L)

R/two-sample-functional-test.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@
4242
#' - `data.eval`: A numeric matrix of shape \eqn{n \times p} storing the evaluation of the functional data on a uniform grid.
4343
#' - `heatmap.matrix`: A numeric matrix storing the p-values. Used only for plots.
4444
#'
45+
#' @references Pini, A., & Vantini, S. (2017). Interval-wise testing for functional data. Journal of Nonparametric Statistics, 29(2), 407-424.
46+
#'
4547
#' @export
4648
#' @examples
4749
#' out <- two_sample_functional_test(trefoils1, archspirals, B = 100L, scale_size = 50L)

README.Rmd

Lines changed: 60 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,17 @@ You can install the development version of inphr from
3939
pak::pak("tdaverse/inphr")
4040
```
4141

42-
## Example
42+
## Usage
4343

4444
Let us start by loading the package:
4545

4646
```{r}
4747
library(inphr)
4848
```
4949

50-
The package contains three sets of persistence diagrams, which can be used for
50+
### Toy data
51+
52+
The package contains three toy data sets of persistence diagrams, which can be used for
5153
testing. They are available in the package as `trefoils1`, `trefoils2`, and
5254
`archspirals`. The first two sets contain persistence diagrams computed from
5355
noisy samples of trefoil knots, while the third set contains persistence
@@ -58,23 +60,70 @@ sampled from the respective shape, with Gaussian noise added (standard deviation
5860
[`TDA::ripsDiag()`](https://www.rdocumentation.org/packages/TDA/versions/1.9.1/topics/ripsDiag)
5961
function with a maximum scale of 6 and up to dimension 2.
6062

63+
### Test in the space of diagrams
64+
6165
You can use the
6266
[`two_sample_diagram_test()`](https://tdaverse.github.io/inphr/reference/two_sample_diagram_test.html)
63-
function to perform a two-sample test on these persistence diagrams. For
64-
example, to test whether the first 5 persistence diagrams from the first set are
65-
significantly different from the first 5 persistence diagrams from the second
66-
set, you can run:
67+
function to perform a two-sample test on these persistence diagrams in the space of diagrams themselves. For
68+
example, to test whether the persistence diagrams from `trefoils1` are
69+
significantly different from the persistence diagrams from `trefoils2`, you can run:
70+
71+
```{r}
72+
two_sample_diagram_test(trefoils1, trefoils2, B = 100L)
73+
```
74+
75+
To test whether the persistence diagrams from `trefoils1` are
76+
significantly different from the persistence diagrams from `archspirals`, you can run:
6777

6878
```{r}
69-
two_sample_diagram_test(trefoils1[1:5], trefoils2[1:5], B = 100L)
79+
two_sample_diagram_test(trefoils1, archspirals, B = 100L)
7080
```
7181

72-
To test whether the first 5 persistence diagrams from the first set are
73-
significantly different from the first 5 persistence diagrams from the third
74-
set, you can run:
82+
Optionnally, the `two_sample_diagram_test()` function can also output the distribution of the test statistic under the null hypothesis as estimated by the permutation scheme. To do that, you can use the optional argument `keep_null_distribution = TRUE`. It is also possible to ask for the permutations themselves to be saved as part of the output. To do that, you can use the optional argument `keep_permutations = TRUE`.
83+
84+
Test in the space of diagrams themselves is performed using test statistics that only rely on distances between sampled diagrams. By default, two such statistics that mimic Student's t-statistic and Fisher's F-statistic are used as proposed in Lovato, I., Pini, A., Stamm, A., & Vantini, S. (2020), *Model-free two-sample test for network-valued data*. Computational Statistics & Data Analysis, **144**, 106896.
85+
86+
### Test in functional spaces
87+
88+
You can use the [`two_sample_functional_test()`](https://tdaverse.github.io/inphr/reference/two_sample_functional_test.html) function to perform a two-sample test on these persistence diagrams in functional spaces using one of five functional representations of persistence diagrams, namely: (i) Betti, (ii) Euler characteristic, (iii) normalized life, (iv) silhouette and (v) entropy curves. Computation of these functional representations is powered by the [{TDAvec}](https://cran.r-project.org/package=TDAvec) package. For
89+
example, to test whether the persistence diagrams from `trefoils1` are
90+
significantly different from the persistence diagrams from `archspirals`, you can use the Betti curve representation and run:
91+
92+
```{r, eval = FALSE}
93+
out <- two_sample_functional_test(
94+
trefoils1,
95+
archspirals,
96+
representation = "betti",
97+
B = 100L
98+
)
99+
```
100+
101+
```{r, include = FALSE}
102+
out <- two_sample_functional_test(
103+
trefoils1,
104+
archspirals,
105+
representation = "betti",
106+
B = 100L
107+
)
108+
```
109+
110+
The output is a length-4 list. The first two elements are `xfd` and `yfd` which are numeric matrices storing evaluations of the functional representation of the diagrams on a grid stored as the third element `scale_seq`. You can therefore have a look at the functional data that the function produced using something like:
111+
112+
```{r}
113+
matplot(
114+
out$scale_seq[-1],
115+
t(rbind(out$xfd, out$yfd)),
116+
type = "l",
117+
col = c(rep(1, length(trefoils1)), rep(2, length(archspirals)))
118+
)
119+
```
120+
121+
In the case of testing in functional spaces, {inphr} uses the interval-wise testing (IWT) procedure powered by the [{fdatest}](https://cran.r-project.org/package=fdatest) package which has been proposed in Pini, A., & Vantini, S. (2017), *Interval-wise testing for functional data*. Journal of Nonparametric Statistics, **29**(2), 407-424.
122+
123+
The output indicates on which portions of the scale sequence does the difference between the two samples occur, providing strong control of the familywise error rate:
75124

76125
```{r}
77-
two_sample_diagram_test(trefoils1[1:5], archspirals[1:5], B = 100L)
126+
plot(out$iwt, xrange = range(out$scale_seq))
78127
```
79128

80129
## Contributions

README.md

Lines changed: 99 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -30,47 +30,124 @@ You can install the development version of inphr from
3030
pak::pak("tdaverse/inphr")
3131
```
3232

33-
## Example
33+
## Usage
3434

3535
Let us start by loading the package:
3636

3737
``` r
3838
library(inphr)
3939
```
4040

41-
The package contains three sets of persistence diagrams, which can be
42-
used for testing. They are available in the package as `trefoils1`,
43-
`trefoils2`, and `archspirals`. The first two sets contain persistence
44-
diagrams computed from noisy samples of trefoil knots, while the third
45-
set contains persistence diagrams computed from noisy samples of 2-armed
46-
Archimedean spirals. Each set contains 24 persistence diagrams, each
47-
computed from a sample of 120 points sampled from the respective shape,
48-
with Gaussian noise added (standard deviation = 0.05). The persistence
49-
diagrams were computed using the
41+
### Toy data
42+
43+
The package contains three toy data sets of persistence diagrams, which
44+
can be used for testing. They are available in the package as
45+
`trefoils1`, `trefoils2`, and `archspirals`. The first two sets contain
46+
persistence diagrams computed from noisy samples of trefoil knots, while
47+
the third set contains persistence diagrams computed from noisy samples
48+
of 2-armed Archimedean spirals. Each set contains 24 persistence
49+
diagrams, each computed from a sample of 120 points sampled from the
50+
respective shape, with Gaussian noise added (standard deviation = 0.05).
51+
The persistence diagrams were computed using the
5052
[`TDA::ripsDiag()`](https://www.rdocumentation.org/packages/TDA/versions/1.9.1/topics/ripsDiag)
5153
function with a maximum scale of 6 and up to dimension 2.
5254

55+
### Test in the space of diagrams
56+
57+
You can use the
58+
[`two_sample_diagram_test()`](https://tdaverse.github.io/inphr/reference/two_sample_diagram_test.html)
59+
function to perform a two-sample test on these persistence diagrams in
60+
the space of diagrams themselves. For example, to test whether the
61+
persistence diagrams from `trefoils1` are significantly different from
62+
the persistence diagrams from `trefoils2`, you can run:
63+
64+
``` r
65+
two_sample_diagram_test(trefoils1, trefoils2, B = 100L)
66+
#> [1] 1
67+
```
68+
69+
To test whether the persistence diagrams from `trefoils1` are
70+
significantly different from the persistence diagrams from
71+
`archspirals`, you can run:
72+
73+
``` r
74+
two_sample_diagram_test(trefoils1, archspirals, B = 100L)
75+
#> [1] 0.00990099
76+
```
77+
78+
Optionnally, the `two_sample_diagram_test()` function can also output
79+
the distribution of the test statistic under the null hypothesis as
80+
estimated by the permutation scheme. To do that, you can use the
81+
optional argument `keep_null_distribution = TRUE`. It is also possible
82+
to ask for the permutations themselves to be saved as part of the
83+
output. To do that, you can use the optional argument
84+
`keep_permutations = TRUE`.
85+
86+
Test in the space of diagrams themselves is performed using test
87+
statistics that only rely on distances between sampled diagrams. By
88+
default, two such statistics that mimic Student’s t-statistic and
89+
Fisher’s F-statistic are used as proposed in Lovato, I., Pini, A.,
90+
Stamm, A., & Vantini, S. (2020), *Model-free two-sample test for
91+
network-valued data*. Computational Statistics & Data Analysis, **144**,
92+
106896.
93+
94+
### Test in functional spaces
95+
5396
You can use the
54-
[`two_sample_test()`](https://tdaverse.github.io/inphr/reference/two_sample_test.html)
55-
function to perform a two-sample test on these persistence diagrams. For
56-
example, to test whether the first 5 persistence diagrams from the first
57-
set are significantly different from the first 5 persistence diagrams
58-
from the second set, you can run:
97+
[`two_sample_functional_test()`](https://tdaverse.github.io/inphr/reference/two_sample_functional_test.html)
98+
function to perform a two-sample test on these persistence diagrams in
99+
functional spaces using one of five functional representations of
100+
persistence diagrams, namely: (i) Betti, (ii) Euler characteristic,
101+
(iii) normalized life, (iv) silhouette and (v) entropy curves.
102+
Computation of these functional representations is powered by the
103+
[{TDAvec}](https://cran.r-project.org/package=TDAvec) package. For
104+
example, to test whether the persistence diagrams from `trefoils1` are
105+
significantly different from the persistence diagrams from
106+
`archspirals`, you can use the Betti curve representation and run:
107+
108+
``` r
109+
out <- two_sample_functional_test(
110+
trefoils1,
111+
archspirals,
112+
representation = "betti",
113+
B = 100L
114+
)
115+
```
116+
117+
The output is a length-4 list. The first two elements are `xfd` and
118+
`yfd` which are numeric matrices storing evaluations of the functional
119+
representation of the diagrams on a grid stored as the third element
120+
`scale_seq`. You can therefore have a look at the functional data that
121+
the function produced using something like:
59122

60123
``` r
61-
two_sample_test(trefoils1[1:5], trefoils2[1:5], B = 100L)
62-
#> [1] 0.9782132
124+
matplot(
125+
out$scale_seq[-1],
126+
t(rbind(out$xfd, out$yfd)),
127+
type = "l",
128+
col = c(rep(1, length(trefoils1)), rep(2, length(archspirals)))
129+
)
63130
```
64131

65-
To test whether the first 5 persistence diagrams from the first set are
66-
significantly different from the first 5 persistence diagrams from the
67-
third set, you can run:
132+
<img src="man/figures/README-unnamed-chunk-7-1.png" width="100%" />
133+
134+
In the case of testing in functional spaces, {inphr} uses the
135+
interval-wise testing (IWT) procedure powered by the
136+
[{fdatest}](https://cran.r-project.org/package=fdatest) package which
137+
has been proposed in Pini, A., & Vantini, S. (2017), *Interval-wise
138+
testing for functional data*. Journal of Nonparametric Statistics,
139+
**29**(2), 407-424.
140+
141+
The output indicates on which portions of the scale sequence does the
142+
difference between the two samples occur, providing strong control of
143+
the familywise error rate:
68144

69145
``` r
70-
two_sample_test(trefoils1[1:5], archspirals[1:5], B = 100L)
71-
#> [1] 0.008047755
146+
plot(out$iwt, xrange = range(out$scale_seq))
72147
```
73148

149+
<img src="man/figures/README-unnamed-chunk-8-1.png" width="100%" /><img src="man/figures/README-unnamed-chunk-8-2.png" width="100%" />
150+
74151
## Contributions
75152

76153
### Code of Conduct
105 KB
Loading
39.9 KB
Loading
71.3 KB
Loading
37.5 KB
Loading
28.3 KB
Loading

man/two_sample_diagram_test.Rd

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)