diff --git a/.github/workflows/test_notebooks.yml b/.github/workflows/test_notebooks.yml index d117b72..06cf1e9 100644 --- a/.github/workflows/test_notebooks.yml +++ b/.github/workflows/test_notebooks.yml @@ -22,4 +22,4 @@ jobs: pip install jupyter - name: Execute notebooks run: | - for f in *.ipynb; do echo "Processing $f file.."; time jupyter nbconvert --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags="['do_not_execute']" --to notebook --ExecutePreprocessor.timeout=600 --inplace --execute $f;done; + for f in tutorials/*.ipynb; do echo "Processing $f file.."; time jupyter nbconvert --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags="['do_not_execute']" --to notebook --ExecutePreprocessor.timeout=600 --inplace --execute $f;done; diff --git a/README.Rmd b/README.Rmd index 2ceac7a..0badcfb 100644 --- a/README.Rmd +++ b/README.Rmd @@ -1,20 +1,20 @@ --- title: "Tutorials for Topological Data Analysis with the Gudhi Library" -output: +output: github_document: pandoc_args: --webtex --- ```{r setup, include=FALSE} knitr::opts_chunk$set( - echo = FALSE, + echo = FALSE, fig.align = "center" ) ``` Topological Data Analysis (TDA) is a recent and fast growing field providing a set of new topological and geometric tools to infer relevant features for possibly complex data. Here we propose a set of notebooks for the practice of TDA with the Python Gudhi library together with popular machine learning and data sciences libraries. See for instance [this paper](https://arxiv.org/abs/1710.04019) for an introduction to TDA for data science. The complete list of notebooks can also be found at the end of this page. -## Install Python Gudhi Library +## Install Python Gudhi Library See the [installation page](https://gudhi.inria.fr/python/latest/installation.html) or if you have conda you can make a [conda install](https://anaconda.org/conda-forge/gudhi). @@ -22,15 +22,15 @@ See the [installation page](https://gudhi.inria.fr/python/latest/installation.ht ### 01 - Simplex trees and simpicial complexes -TDA typically aims at extracting topological signatures from a point cloud in $\mathbb{R}^d$ or in a general metric space. By studying the topology of a point cloud, we actually mean studying the topology of the unions of balls centered at the point cloud, also called *offsets*. However, non-discrete sets such as offsets, and also continuous mathematical shapes like curves, surfaces and more generally manifolds, cannot easily be encoded as finite discrete structures. [Simplicial complexes](https://en.wikipedia.org/wiki/Simplicial_complex) are therefore used in computational geometry to approximate such shapes. +TDA typically aims at extracting topological signatures from a point cloud in $\mathbb{R}^d$ or in a general metric space. By studying the topology of a point cloud, we actually mean studying the topology of the unions of balls centered at the point cloud, also called *offsets*. However, non-discrete sets such as offsets, and also continuous mathematical shapes like curves, surfaces and more generally manifolds, cannot easily be encoded as finite discrete structures. [Simplicial complexes](https://en.wikipedia.org/wiki/Simplicial_complex) are therefore used in computational geometry to approximate such shapes. A simplicial complex is a set of [simplices](https://en.wikipedia.org/wiki/Simplex), they can be seen as higher dimensional generalization of graphs. These are mathematical objects that are both topological and combinatorial, a property making them particularly useful for TDA. The challenge here is to define such structures that are proven to reflect relevant information about the structure of data and that can be effectively constructed and manipulated in practice. Below is an exemple of simplicial complex: ```{r simplicial-complex-example} -knitr::include_graphics("Images/Pers14.PNG") +knitr::include_graphics("tutorials/Images/Pers14.PNG") ``` - -A filtration is an increasing sequence of sub-complexes of a simplicial complex $\mathcal{K}$. It can be seen as ordering the simplices included in the complex $\mathcal{K}$. Indeed, simpicial complexes often come with a specific order, as for [Vietoris-Rips complexes](https://en.wikipedia.org/wiki/Vietoris%E2%80%93Rips_complex), [Cech complexes](https://en.wikipedia.org/wiki/%C4%8Cech_complex) and [alpha complexes](https://en.wikipedia.org/wiki/Alpha_shape#Alpha_complex). + +A filtration is an increasing sequence of sub-complexes of a simplicial complex $\mathcal{K}$. It can be seen as ordering the simplices included in the complex $\mathcal{K}$. Indeed, simpicial complexes often come with a specific order, as for [Vietoris-Rips complexes](https://en.wikipedia.org/wiki/Vietoris%E2%80%93Rips_complex), [Cech complexes](https://en.wikipedia.org/wiki/%C4%8Cech_complex) and [alpha complexes](https://en.wikipedia.org/wiki/Alpha_shape#Alpha_complex). [Notebook: Simplex trees](Tuto-GUDHI-simplex-Trees.ipynb). In Gudhi, filtered simplicial complexes are encoded through a data structure called simplex tree. Vertices are represented as integers, edges as pairs of integers, etc. @@ -42,16 +42,16 @@ knitr::include_graphics("https://gudhi.inria.fr/python/latest/_images/Simplex_tr [Notebook: Rips and alpha complexes from pairwise distance](Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb). It is also possible to define Rips complexes in general metric spaces from a matrix of pairwise distances. The definition of the metric on the data is usually given as an input or guided by the application. It is however important to notice that the choice of the metric may be critical to reveal interesting topological and geometric features of the data. We also give in this last notebook a way to define alpha complexes from matrix of pairwise distances by first applying a [multidimensional scaling (MDS)](https://en.wikipedia.org/wiki/Multidimensional_scaling) transformation on the matrix. -TDA signatures can extracted from point clouds but in many cases in data sciences the question is to study the topology of the sublevel sets of a function. +TDA signatures can extracted from point clouds but in many cases in data sciences the question is to study the topology of the sublevel sets of a function. ```{r sublevel-sets-example} -knitr::include_graphics("Images/sublevf.png") +knitr::include_graphics("tutorials/Images/sublevf.png") ``` -Above is an example for a function defined on a subset of $\mathbb{R}$ but in general the function $f$ is defined on a subset of $\mathbb{R}^d$. +Above is an example for a function defined on a subset of $\mathbb{R}$ but in general the function $f$ is defined on a subset of $\mathbb{R}^d$. [Notebook: cubical complexes](Tuto-GUDHI-cubical-complexes.ipynb). One first approach for studying the topology of the sublevel sets of a function is to define a regular grid on $\mathbb{R}^d$ and then to define a filtered complex based on this grid and the function $f$. - + ### 02 - Persistent homology and persistence diagrams Homology is a well-known concept in algebraic topology. It provides a powerful tool to formalize and handle the notion of topological features of a topological space or of a simplicial complex in an algebraic way. For any dimension $k$, the $k$-dimensional *holes* are represented by a vector space $H_k$, whose dimension is intuitively the number of such independent features. For example, the $0$-dimensional homology group $H_0$ represents the connected components of the complex, the $1$-dimensional homology group $H_1$ represents the $1$-dimensional loops, the $2$-dimensional homology group $H_2$ represents the $2$-dimensional cavities and so on. @@ -59,10 +59,10 @@ Homology is a well-known concept in algebraic topology. It provides a powerful t Persistent homology is a powerful tool to compute, study and encode efficiently multiscale topological features of nested families of simplicial complexes and topological spaces. It encodes the evolution of the homology groups of the nested complexes across the scales. The diagram below shows several level sets of the filtration: ```{r persistence} -knitr::include_graphics("Images/pers.png") +knitr::include_graphics("tutorials/Images/pers.png") ``` -[Notebook: persistence diagrams](Tuto-GUDHI-persistence-diagrams.ipynb) In this notebook we show how to compute barcodes and persistence diagrams from a filtration defined on the Protein binding dataset. This tutorial also introduces the bottleneck distance between persistence diagrams. +[Notebook: persistence diagrams](Tuto-GUDHI-persistence-diagrams.ipynb) In this notebook we show how to compute barcodes and persistence diagrams from a filtration defined on the Protein binding dataset. This tutorial also introduces the bottleneck distance between persistence diagrams. ### 03 - Representations of persistence and linearization @@ -82,7 +82,7 @@ C. Oballe and V. Maroulas provide a [tutorial](https://github.com/coballejr/misc ### 06 - Machine learning and deep learning with TDA -Two libraries related to Gudhi: +Two libraries related to Gudhi: - [ATOL](https://github.com/martinroyer/atol): Automatic Topologically-Oriented Learning. See [this tutorial](https://github.com/martinroyer/atol/blob/master/demo/atol-demo.ipynb). - [Perslay](https://github.com/MathieuCarriere/perslay): A Simple and Versatile Neural Network Layer for Persistence Diagrams. See [notebook](Tuto-GUDHI-perslay-visu.ipynb). @@ -97,7 +97,7 @@ This [notebook](Tuto-GUDHI-DTM-filtrations.ipynb) introduces the distance to mea ### 10 - TDA and dimension reduction -### 11 - Inverse problem and optimization with TDA +### 11 - Inverse problem and optimization with TDA In this [notebook](Tuto-GUDHI-optimization.ipynb), we will see how Gudhi and Tensorflow can be combined to perform optimization of persistence diagrams to sove an inverse problem. diff --git a/README.md b/README.md index 6fc941b..3a32b94 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,7 @@ structures that are proven to reflect relevant information about the structure of data and that can be effectively constructed and manipulated in practice. Below is an exemple of simplicial complex: -![simplicial complex example](Images/Pers14.PNG) +![simplicial complex example](tutorials/Images/Pers14.PNG) A filtration is an increasing sequence of sub-complexes of a simplicial complex $\mathcal{K}$. It can be seen as ordering the simplices included in @@ -50,15 +50,15 @@ complexes](https://en.wikipedia.org/wiki/Vietoris%E2%80%93Rips_complex), [alpha complexes](https://en.wikipedia.org/wiki/Alpha_shape#Alpha_complex). -[Notebook: Simplex trees](Tuto-GUDHI-simplex-Trees.ipynb). In Gudhi, +[Notebook: Simplex trees](tutorials/Tuto-GUDHI-simplex-Trees.ipynb). In Gudhi, filtered simplicial complexes are encoded through a data structure called simplex tree. Vertices are represented as integers, edges as pairs of integers, etc. -![simplex tree representation](Images/Simplex_tree_representation.png) +![simplex tree representation](tutorials/Images/Simplex_tree_representation.png) [Notebook: Vietoris-Rips complexes and alpha complexes from data -points](https://github.com/GUDHI/TDA-tutorial/blob/master/Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb). +points](tutorials/Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb). In practice, the first step of the **TDA Analysis Pipeline** is to define a filtration of simplicial complexes for some data. This notebook explains how to build Vietoris-Rips complexes and alpha complexes (represented as @@ -66,11 +66,11 @@ simplex trees) from data points in $\mathbb{R}^d$, using the simplex tree data structure. -This [Notebook](Tuto-GUDHI-alpha-complex-visualization.ipynb) shows how to visualize simplicial complexes. +This [Notebook](tutorials/Tuto-GUDHI-alpha-complex-visualization.ipynb) shows how to visualize simplicial complexes. [Notebook: Rips and alpha complexes from pairwise -distance](Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb). +distance](tutorials/Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb). It is also possible to define Rips complexes in general metric spaces from a matrix of pairwise distances. The definition of the metric on the data is usually given as an input or guided by the application. It is @@ -85,13 +85,13 @@ TDA signatures can extracted from point clouds but in many cases in data sciences the question is to study the topology of the sublevel sets of a function. -![function exemple](Images/sublevf.png) +![function exemple](tutorials/Images/sublevf.png) Above is an example for a function defined on a subset of $\mathbb{R}$ but in general the function $f$ is defined on a subset of $\mathbb{R}^d$. -[Notebook: cubical complexes](Tuto-GUDHI-cubical-complexes.ipynb). One +[Notebook: cubical complexes](tutorials/Tuto-GUDHI-cubical-complexes.ipynb). One first approach for studying the topology of the sublevel sets of a function is to define a regular grid on $\mathbb{R}^d$ and then to define a filtered complex based on this grid and the @@ -115,9 +115,9 @@ simplicial complexes and topological spaces. It encodes the evolution of the homology groups of the nested complexes across the scales. The diagram below shows several level sets of the filtration: -![persistence](Images/pers.png) +![persistence](tutorials/Images/pers.png) -[Notebook: persistence diagrams](https://github.com/GUDHI/TDA-tutorial/blob/master/Tuto-GUDHI-persistence-diagrams.ipynb) +[Notebook: persistence diagrams](tutorials/Tuto-GUDHI-persistence-diagrams.ipynb) In this notebook we show how to compute barcodes and persistence diagrams from a filtration defined on the Protein binding dataset. This tutorial also introduces the bottleneck distance between persistence @@ -125,16 +125,16 @@ diagrams. ### 03 - Representations of persistence and linearization -In this [notebook](Tuto-GUDHI-representations.ipynb), we learn how to +In this [notebook](tutorials/Tuto-GUDHI-representations.ipynb), we learn how to use alternative representations of persistence with the representations module and finally we see a first example of how to efficiently combine machine learning and topological data analysis. -This [notebook](Tuto-GUDHI-Expected-persistence-diagrams.ipynb) +This [notebook](tutorials/Tuto-GUDHI-Expected-persistence-diagrams.ipynb) illustrates the notion of “Expected Persistence Diagram”, which is a way to encode the topology of a random process as a deterministic measure. -This [notebook](Tuto-GUDHI-persistent-entropy.ipynb) shows how to summarize +This [notebook](tutorials/Tuto-GUDHI-persistent-entropy.ipynb) shows how to summarize the information given by persistent homology using persistent entropy (a number) and the ES-function (a curve) and explains in which situations they can be useful. @@ -146,7 +146,7 @@ features close to the diagonal. Since they correspond to topological structures that die very soon after they appear in the filtration, these points are generally considered as “topological noise”. Confidence regions for persistence diagram provide a rigorous framework to this -idea. This [notebook](Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb) +idea. This [notebook](tutorials/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb) introduces the subsampling approach of [Fasy et al. 2014 AoS](https://projecteuclid.org/download/pdfview_1/euclid.aos/1413810729). @@ -167,15 +167,15 @@ Two libraries related to Gudhi: tutorial](https://github.com/martinroyer/atol/blob/master/demo/atol-demo.ipynb). - [Perslay](https://github.com/MathieuCarriere/perslay): A Simple and Versatile Neural Network Layer for Persistence Diagrams. See [this - notebook](Tuto-GUDHI-perslay-visu.ipynb). + notebook](tutorials/Tuto-GUDHI-perslay-visu.ipynb). ### 07 - Alternative filtrations and robust TDA -This [notebook](Tuto-GUDHI-DTM-filtrations.ipynb) introduces the +This [notebook](tutorials/Tuto-GUDHI-DTM-filtrations.ipynb) introduces the distance to measure (DTM) filtration, as defined in [this paper](https://arxiv.org/abs/1811.04757). This filtration can be used for robust TDA. The DTM can also be used for robust approximations of -compact sets, see this [notebook](Tuto-GUDHI-kPDTM-kPLM.ipynb). +compact sets, see this [notebook](tutorials/Tuto-GUDHI-kPDTM-kPLM.ipynb). ### 08 - Topological Data Analysis for Time series @@ -185,49 +185,49 @@ compact sets, see this [notebook](Tuto-GUDHI-kPDTM-kPLM.ipynb). ### 11 - Inverse problem and optimization with TDA -In this [notebook](Tuto-GUDHI-optimization.ipynb), we will see how Gudhi and +In this [notebook](tutorials/Tuto-GUDHI-optimization.ipynb), we will see how Gudhi and Tensorflow can be combined to perform optimization of persistence diagrams to solve an inverse problem. This other, less complete -[notebook](Tuto-GUDHI-PyTorch-optimization.ipynb) shows that this kind of +[notebook](tutorials/Tuto-GUDHI-PyTorch-optimization.ipynb) shows that this kind of optimization works just as well with PyTorch. ## Complete list of notebooks for TDA -[Simplex trees](Tuto-GUDHI-simplex-Trees.ipynb) +[Simplex trees](tutorials/Tuto-GUDHI-simplex-Trees.ipynb) [Vietoris-Rips complexes and alpha complexes from data -points](Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb) +points](tutorials/Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb) [Visualizing simplicial -complexes](Tuto-GUDHI-alpha-complex-visualization.ipynb) +complexes](tutorials/Tuto-GUDHI-alpha-complex-visualization.ipynb) [Rips and alpha complexes from pairwise -distance](Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb) +distance](tutorials/Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb) -[Cubical complexes](Tuto-GUDHI-cubical-complexes.ipynb) +[Cubical complexes](tutorials/Tuto-GUDHI-cubical-complexes.ipynb) [Persistence diagrams and bottleneck -distance](Tuto-GUDHI-persistence-diagrams.ipynb) +distance](tutorials/Tuto-GUDHI-persistence-diagrams.ipynb) -[Representations of persistence](Tuto-GUDHI-representations.ipynb) +[Representations of persistence](tutorials/Tuto-GUDHI-representations.ipynb) [Expected Persistence -Diagram](Tuto-GUDHI-Expected-persistence-diagrams.ipynb) +Diagram](tutorials/Tuto-GUDHI-Expected-persistence-diagrams.ipynb) [Confidence regions for persistence diagrams - data -points](Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb) +points](tutorials/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb) [ATOL tutorial](https://github.com/martinroyer/atol/blob/master/demo/atol-demo.ipynb) -[Perslay](Tuto-GUDHI-perslay-visu.ipynb) +[Perslay](tutorials/Tuto-GUDHI-perslay-visu.ipynb) -[DTM-filtrations](Tuto-GUDHI-DTM-filtrations.ipynb) +[DTM-filtrations](tutorials/Tuto-GUDHI-DTM-filtrations.ipynb) -[kPDTM-kPLM](Tuto-GUDHI-kPDTM-kPLM.ipynb) +[kPDTM-kPLM](tutorials/Tuto-GUDHI-kPDTM-kPLM.ipynb) -[Inverse problem and optimization with TDA](Tuto-GUDHI-optimization.ipynb) +[Inverse problem and optimization with TDA](tutorials/Tuto-GUDHI-optimization.ipynb) -[PyTorch differentiation of diagrams](Tuto-GUDHI-PyTorch-optimization.ipynb) +[PyTorch differentiation of diagrams](tutorials/Tuto-GUDHI-PyTorch-optimization.ipynb) Contact : diff --git a/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb b/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb deleted file mode 100644 index 0aca49a..0000000 --- a/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb +++ /dev/null @@ -1,447 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "collapsed": true - }, - "source": [ - "# TDA with Python using the Gudhi Library \n", - "\n", - "# Confidence regions for persistence diagrams : data points " - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import persistence_statistics as ps\n", - "\n", - "import pandas as pd\n", - "import numpy as np\n", - "import pickle as pickle\n", - "import gudhi as gd \n", - "import seaborn as sbs\n", - "from scipy.spatial import distance_matrix\n", - "from pylab import *" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Introduction" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this tutorial, we introduce confidence regions for persistence diagrams built on a set of data points. We present the subsampling approach of [Fasy et al. 2014 AoS](https://projecteuclid.org/download/pdfview_1/euclid.aos/1413810729). See [this notebook](https://github.com/GUDHI/TDA-tutorial/blob/master/Tuto-GUDHI-persistence-diagrams.ipynb) for an introduction to persistence diagrams with Gudhi." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For many applications of persistent homology, we observe topological features close to the diagonal. Since they correspond to topological structures that die very soon after they appear in the filtration, these points are generally considered as \"topological noise\". Confidence regions for persistence diagram provide a rigorous framework to this idea." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Confidence regions for persistence diagrams provide a rigorous framework for selecting significant topological features in a persistence diagram. We use the bottleneck distance $d_b$ to define confidence regions. We see point clouds as random variables. Under this approach, persistence diagrams are also seen as random quantities." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Confidence regions for persistence diagrams for point cloud data in $\\mathbb R^d$" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We introduce the method for a simulated dataset." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "U1 = np.random.uniform(0,2 * pi,size= 1000)\n", - "V1 = np.array([[0.35 * cos(u) +0.02*np.random.uniform(-1,1) ,\n", - " 0.35 *sin(u)+0.02*np.random.uniform(-1,1)] for u in U1])\n", - "U2 = np.random.uniform(0,2 * pi,size= 2000)\n", - "V2 = np.array([[0.7* cos(u) +0.02*np.random.uniform(-1,1) ,\n", - " 0.7*sin(u)+0.02*np.random.uniform(-1,1)] for u in U2])\n", - "W = np.concatenate((V1,V2), axis=0)\n", - "plt.scatter(W[:,0],W[:,1],s=0.1);\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Subsampling approach\n", - "\n", - "Let $\\mathbb X$ and $\\mathbb Y$ be two compact sets.\n", - "For the filtrations given below, persistence homology is stable with respect of Hausdorff perturbations:\n", - "$$\n", - "d_b\\left( Dgm \\left(Filt(\\mathbb X) \\right) , Dgm \\left( Filt(\\mathbb Y) \\right)\\right)\n", - "\\leq C_{Filt}\n", - " Haus \\left(\\mathbb X, \\mathbb Y \\right)\n", - "$$ \n", - "\n", - "The previous inequality is valid for the following Gudhi filtrations: \n", - "- for the Rips complex filtration with $C_{Rips} = 2$, \n", - "- for the $\\sqrt{alpha}$-complexes filtration (see further) with $C_{Alpha}= 1$. \n", - "\n", - "Following [Fasy et al. 2014 AoS](https://projecteuclid.org/download/pdfview_1/euclid.aos/1413810729) we derive confidence sets for persistence diagrams (for $d_b$) from confidence sets for compact sets (for $Haus$). Let $\\mathbb X_n$ be a sample from a distribution $P$ with compact support $\\mathbb X$. \n", - "The aim is to find a parameter $c_\\alpha$ such that\n", - "\n", - "$$ P ( Hauss(\\mathbb X_n, \\mathbb X) \\leq c_\\alpha) \\geq 1-\\alpha .$$\n", - "\n", - "The confidence set $\\mathcal C$ we consider is a subset of all persistence diagrams whose bottleneck distance to $Dgm \\left(Filt(\\mathbb X_n) \\right) $ is less than $d_\\alpha$:\n", - "$$ \\left\\{ Dgm \\: | \\: d_b \\left( Diag , Dgm \\left(Filt(\\mathbb X_n) \\right) \\right) c\\leq d_\\alpha \\right\\}, $$\n", - "with \n", - "$$ d_\\alpha = C_{Filt} c_\\alpha .$$\n", - "\n", - "The `hausd_interval` function from the `persistence_statistics` module implements the subsampling method of [Fasy et al. 2014 AoS](https://projecteuclid.org/download/pdfview_1/euclid.aos/1413810729), it outputs an estimation $\\hat c_\\alpha$ of $c_\\alpha$. By default a multiprocessing computation is applied." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0.053896359713091466\n" - ] - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "hatc = ps.hausd_interval(data=W,level = 0.90, m = 2500)\n", - "print(hatc)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Stability and confidence region for the $\\sqrt{alpha}$-filtration" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When computing confidence regions for alpha complexes, we need to be careful with the scale of values of the filtration because the filtration value of each simplex is computed as the square of the circumradius of the simplex (if the circumsphere is empty)." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "Alpha_complex_W = gd.AlphaComplex(points = W)\n", - "Alpha_simplex_tree_W = Alpha_complex_W.create_simplex_tree() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We change the filtration value of each simplex by taking the square root of the filtration values:" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "Alpha_simplex_tree_W_list = Alpha_simplex_tree_W.get_filtration()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "for splx in Alpha_simplex_tree_W_list:\n", - " Alpha_simplex_tree_W.assign_filtration(splx[0],filtration= np.sqrt(splx[1]))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can compute persistence for the rescaled $\\sqrt{alpha}$ complex filtration." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "pers_alpha_W= Alpha_simplex_tree_W.persistence()" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "gd.plot_persistence_diagram(pers_alpha_W);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We now define the confidence region for this persistence diagram. We have to take a band of width $ d_\\alpha = C_{Filt} c_\\alpha $ $\\hat c_\\alpha$ to compute and plot the confidence band. The `band` parameter is the vertical height of the confidence region, it is thus twice the value of $\\hat c _\\alpha$ (because the bottleneck distance is based on the $\\ell_\\infty$ norm)." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "gd.plot_persistence_diagram(pers_alpha_W, band=2 * hatc);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Only the topological features above the red band are considered as significant. Here we select the main topological features by this way.\n", - "\n", - "Generally speaking, the procedure is very conservative: the band is very large and only very few topological features are seen as significant." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Confidence regions for persistence diagrams of filtrations based on pairwise distances" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The subsampling approach can be also applied when data come has a matrix of pairwise distances." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We illustrate the procedure with the `trefoil_dist` dataset which contains the distances between 1000 points sampled in the neighborhood of a trefoil curve." - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "trefoil_dist = pickle.load( open( \"./datasets/trefoil_dist\", \"rb\" ) )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We use again the `hausd_interval` function to infer the Hausdorff distance between the data and the support of the underlying distribution of the data." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "0.396104059680682\n" - ] - } - ], - "source": [ - "hatc = ps.hausd_interval(trefoil_dist,pairwise_dist=True,level = 0.90, m = 900)\n", - "print(hatc)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, we define the Rips complex filtration from the matrix of pairwise distances:" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "skeleton_trefoil = gd.RipsComplex(distance_matrix = trefoil_dist,max_edge_length=2) \n", - "Rips_simplex_tree_trefoil = skeleton_trefoil.create_simplex_tree(max_dimension=2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "and we compute persistence on this filtration:" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "BarCodes_trefoil = Rips_simplex_tree_trefoil.persistence()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To define a confidence band for the persistence diagram, we have to take a band of width $ \\hat d_\\alpha = 2 \\hat c_\\alpha$.\n", - "\n", - "The `band` parameter being the vertical height of the confidence region, it is twice the value of $\\hat d _\\alpha$ (because the bottleneck distance is based on the $\\ell_\\infty$ norm).\n", - "\n", - "So finally we take this band parameter equal to four times $\\hat c_\\alpha$." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "gd.plot_persistence_diagram(BarCodes_trefoil,band = 4*hatc);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We see that only one topological feature of dimension 1 is seen as a significant." - ] - } - ], - "metadata": { - "anaconda-cloud": {}, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.3" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": true, - "sideBar": true, - "skip_h1_title": false, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false - } - }, - "nbformat": 4, - "nbformat_minor": 1 -} diff --git a/Images/CodeCogsEqnRp.gif b/tutorials/Images/CodeCogsEqnRp.gif similarity index 100% rename from Images/CodeCogsEqnRp.gif rename to tutorials/Images/CodeCogsEqnRp.gif diff --git a/Images/MatchingDiag.png b/tutorials/Images/MatchingDiag.png similarity index 100% rename from Images/MatchingDiag.png rename to tutorials/Images/MatchingDiag.png diff --git a/Images/Pers14.PNG b/tutorials/Images/Pers14.PNG similarity index 100% rename from Images/Pers14.PNG rename to tutorials/Images/Pers14.PNG diff --git a/Images/Simplex_tree_representation.png b/tutorials/Images/Simplex_tree_representation.png similarity index 100% rename from Images/Simplex_tree_representation.png rename to tutorials/Images/Simplex_tree_representation.png diff --git a/Images/nappe_distance_avec_bruit.png b/tutorials/Images/nappe_distance_avec_bruit.png similarity index 100% rename from Images/nappe_distance_avec_bruit.png rename to tutorials/Images/nappe_distance_avec_bruit.png diff --git a/Images/nappe_distance_sans_bruit.png b/tutorials/Images/nappe_distance_sans_bruit.png similarity index 100% rename from Images/nappe_distance_sans_bruit.png rename to tutorials/Images/nappe_distance_sans_bruit.png diff --git a/Images/nappe_dtm_avec_bruit.png b/tutorials/Images/nappe_dtm_avec_bruit.png similarity index 100% rename from Images/nappe_dtm_avec_bruit.png rename to tutorials/Images/nappe_dtm_avec_bruit.png diff --git a/Images/pers.png b/tutorials/Images/pers.png similarity index 100% rename from Images/pers.png rename to tutorials/Images/pers.png diff --git a/Images/persistence.png b/tutorials/Images/persistence.png similarity index 100% rename from Images/persistence.png rename to tutorials/Images/persistence.png diff --git a/Images/sous_niveau_kPDTM2.png b/tutorials/Images/sous_niveau_kPDTM2.png similarity index 100% rename from Images/sous_niveau_kPDTM2.png rename to tutorials/Images/sous_niveau_kPDTM2.png diff --git a/Images/sous_niveau_kPDTM_cov2.png b/tutorials/Images/sous_niveau_kPDTM_cov2.png similarity index 100% rename from Images/sous_niveau_kPDTM_cov2.png rename to tutorials/Images/sous_niveau_kPDTM_cov2.png diff --git a/Images/sublevf.png b/tutorials/Images/sublevf.png similarity index 100% rename from Images/sublevf.png rename to tutorials/Images/sublevf.png diff --git a/Images/symbole_infini.png b/tutorials/Images/symbole_infini.png similarity index 100% rename from Images/symbole_infini.png rename to tutorials/Images/symbole_infini.png diff --git a/Tuto-GUDHI-Barycenters-of-persistence-diagrams.ipynb b/tutorials/Tuto-GUDHI-Barycenters-of-persistence-diagrams.ipynb similarity index 100% rename from Tuto-GUDHI-Barycenters-of-persistence-diagrams.ipynb rename to tutorials/Tuto-GUDHI-Barycenters-of-persistence-diagrams.ipynb diff --git a/tutorials/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb b/tutorials/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb new file mode 100644 index 0000000..cb8cdd0 --- /dev/null +++ b/tutorials/Tuto-GUDHI-ConfRegions-PersDiag-datapoints.ipynb @@ -0,0 +1,443 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + } + }, + "source": [ + "# TDA with Python using the Gudhi Library \n", + "\n", + "# Confidence regions for persistence diagrams : data points " + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import utils.persistence_statistics as ps\n", + "\n", + "import pandas as pd\n", + "import numpy as np\n", + "import pickle as pickle\n", + "import gudhi as gd \n", + "import seaborn as sbs\n", + "from scipy.spatial import distance_matrix\n", + "from pylab import *" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this tutorial, we introduce confidence regions for persistence diagrams built on a set of data points. We present the subsampling approach of [Fasy et al. 2014 AoS](https://projecteuclid.org/download/pdfview_1/euclid.aos/1413810729). See [this notebook](https://github.com/GUDHI/TDA-tutorial/blob/master/Tuto-GUDHI-persistence-diagrams.ipynb) for an introduction to persistence diagrams with Gudhi." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For many applications of persistent homology, we observe topological features close to the diagonal. Since they correspond to topological structures that die very soon after they appear in the filtration, these points are generally considered as \"topological noise\". Confidence regions for persistence diagram provide a rigorous framework to this idea." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Confidence regions for persistence diagrams provide a rigorous framework for selecting significant topological features in a persistence diagram. We use the bottleneck distance $d_b$ to define confidence regions. We see point clouds as random variables. Under this approach, persistence diagrams are also seen as random quantities." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Confidence regions for persistence diagrams for point cloud data in $\\mathbb R^d$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We introduce the method for a simulated dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "U1 = np.random.uniform(0,2 * pi,size= 1000)\n", + "V1 = np.array([[0.35 * cos(u) +0.02*np.random.uniform(-1,1) ,\n", + " 0.35 *sin(u)+0.02*np.random.uniform(-1,1)] for u in U1])\n", + "U2 = np.random.uniform(0,2 * pi,size= 2000)\n", + "V2 = np.array([[0.7* cos(u) +0.02*np.random.uniform(-1,1) ,\n", + " 0.7*sin(u)+0.02*np.random.uniform(-1,1)] for u in U2])\n", + "W = np.concatenate((V1,V2), axis=0)\n", + "plt.scatter(W[:,0],W[:,1],s=0.1);\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Subsampling approach\n", + "\n", + "Let $\\mathbb X$ and $\\mathbb Y$ be two compact sets.\n", + "For the filtrations given below, persistence homology is stable with respect of Hausdorff perturbations:\n", + "$$\n", + "d_b\\left( Dgm \\left(Filt(\\mathbb X) \\right) , Dgm \\left( Filt(\\mathbb Y) \\right)\\right)\n", + "\\leq C_{Filt}\n", + " Haus \\left(\\mathbb X, \\mathbb Y \\right)\n", + "$$ \n", + "\n", + "The previous inequality is valid for the following Gudhi filtrations: \n", + "- for the Rips complex filtration with $C_{Rips} = 2$, \n", + "- for the $\\sqrt{alpha}$-complexes filtration (see further) with $C_{Alpha}= 1$. \n", + "\n", + "Following [Fasy et al. 2014 AoS](https://projecteuclid.org/download/pdfview_1/euclid.aos/1413810729) we derive confidence sets for persistence diagrams (for $d_b$) from confidence sets for compact sets (for $Haus$). Let $\\mathbb X_n$ be a sample from a distribution $P$ with compact support $\\mathbb X$. \n", + "The aim is to find a parameter $c_\\alpha$ such that\n", + "\n", + "$$ P ( Hauss(\\mathbb X_n, \\mathbb X) \\leq c_\\alpha) \\geq 1-\\alpha .$$\n", + "\n", + "The confidence set $\\mathcal C$ we consider is a subset of all persistence diagrams whose bottleneck distance to $Dgm \\left(Filt(\\mathbb X_n) \\right) $ is less than $d_\\alpha$:\n", + "$$ \\left\\{ Dgm \\: | \\: d_b \\left( Diag , Dgm \\left(Filt(\\mathbb X_n) \\right) \\right) c\\leq d_\\alpha \\right\\}, $$\n", + "with \n", + "$$ d_\\alpha = C_{Filt} c_\\alpha .$$\n", + "\n", + "The `hausd_interval` function from the `persistence_statistics` module implements the subsampling method of [Fasy et al. 2014 AoS](https://projecteuclid.org/download/pdfview_1/euclid.aos/1413810729), it outputs an estimation $\\hat c_\\alpha$ of $c_\\alpha$. By default a multiprocessing computation is applied." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.05282828169979164\n" + ] + } + ], + "source": [ + "hatc = ps.hausd_interval(data=W,level = 0.90, m = 2500)\n", + "print(hatc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Stability and confidence region for the $\\sqrt{alpha}$-filtration" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When computing confidence regions for alpha complexes, we need to be careful with the scale of values of the filtration because the filtration value of each simplex is computed as the square of the circumradius of the simplex (if the circumsphere is empty)." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "Alpha_complex_W = gd.AlphaComplex(points = W)\n", + "Alpha_simplex_tree_W = Alpha_complex_W.create_simplex_tree() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We change the filtration value of each simplex by taking the square root of the filtration values:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "Alpha_simplex_tree_W_list = Alpha_simplex_tree_W.get_filtration()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "for splx in Alpha_simplex_tree_W_list:\n", + " Alpha_simplex_tree_W.assign_filtration(splx[0],filtration= np.sqrt(splx[1]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can compute persistence for the rescaled $\\sqrt{alpha}$ complex filtration." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "pers_alpha_W= Alpha_simplex_tree_W.persistence()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "gd.plot_persistence_diagram(pers_alpha_W);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We now define the confidence region for this persistence diagram. We have to take a band of width $ d_\\alpha = C_{Filt} c_\\alpha $ $\\hat c_\\alpha$ to compute and plot the confidence band. The `band` parameter is the vertical height of the confidence region, it is thus twice the value of $\\hat c _\\alpha$ (because the bottleneck distance is based on the $\\ell_\\infty$ norm)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "gd.plot_persistence_diagram(pers_alpha_W, band=2 * hatc);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Only the topological features above the red band are considered as significant. Here we select the main topological features by this way.\n", + "\n", + "Generally speaking, the procedure is very conservative: the band is very large and only very few topological features are seen as significant." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Confidence regions for persistence diagrams of filtrations based on pairwise distances" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The subsampling approach can be also applied when data come has a matrix of pairwise distances." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We illustrate the procedure with the `trefoil_dist` dataset which contains the distances between 1000 points sampled in the neighborhood of a trefoil curve." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "trefoil_dist = pickle.load( open( \"./datasets/trefoil_dist\", \"rb\" ) )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We use again the `hausd_interval` function to infer the Hausdorff distance between the data and the support of the underlying distribution of the data." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.396104059680682\n" + ] + } + ], + "source": [ + "hatc = ps.hausd_interval(trefoil_dist,pairwise_dist=True,level = 0.90, m = 900)\n", + "print(hatc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, we define the Rips complex filtration from the matrix of pairwise distances:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "skeleton_trefoil = gd.RipsComplex(distance_matrix = trefoil_dist,max_edge_length=2) \n", + "Rips_simplex_tree_trefoil = skeleton_trefoil.create_simplex_tree(max_dimension=2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "and we compute persistence on this filtration:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "BarCodes_trefoil = Rips_simplex_tree_trefoil.persistence()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To define a confidence band for the persistence diagram, we have to take a band of width $ \\hat d_\\alpha = 2 \\hat c_\\alpha$.\n", + "\n", + "The `band` parameter being the vertical height of the confidence region, it is twice the value of $\\hat d _\\alpha$ (because the bottleneck distance is based on the $\\ell_\\infty$ norm).\n", + "\n", + "So finally we take this band parameter equal to four times $\\hat c_\\alpha$." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "gd.plot_persistence_diagram(BarCodes_trefoil,band = 4*hatc);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We see that only one topological feature of dimension 1 is seen as a significant." + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.0" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/Tuto-GUDHI-DTM-filtrations.ipynb b/tutorials/Tuto-GUDHI-DTM-filtrations.ipynb similarity index 100% rename from Tuto-GUDHI-DTM-filtrations.ipynb rename to tutorials/Tuto-GUDHI-DTM-filtrations.ipynb diff --git a/Tuto-GUDHI-Expected-persistence-diagrams.ipynb b/tutorials/Tuto-GUDHI-Expected-persistence-diagrams.ipynb similarity index 100% rename from Tuto-GUDHI-Expected-persistence-diagrams.ipynb rename to tutorials/Tuto-GUDHI-Expected-persistence-diagrams.ipynb diff --git a/Tuto-GUDHI-PyTorch-optimization.ipynb b/tutorials/Tuto-GUDHI-PyTorch-optimization.ipynb similarity index 100% rename from Tuto-GUDHI-PyTorch-optimization.ipynb rename to tutorials/Tuto-GUDHI-PyTorch-optimization.ipynb diff --git a/Tuto-GUDHI-Quantization-of-persistence-diagrams.ipynb b/tutorials/Tuto-GUDHI-Quantization-of-persistence-diagrams.ipynb similarity index 100% rename from Tuto-GUDHI-Quantization-of-persistence-diagrams.ipynb rename to tutorials/Tuto-GUDHI-Quantization-of-persistence-diagrams.ipynb diff --git a/Tuto-GUDHI-alpha-complex-visualization.ipynb b/tutorials/Tuto-GUDHI-alpha-complex-visualization.ipynb similarity index 100% rename from Tuto-GUDHI-alpha-complex-visualization.ipynb rename to tutorials/Tuto-GUDHI-alpha-complex-visualization.ipynb diff --git a/Tuto-GUDHI-cover-complex.ipynb b/tutorials/Tuto-GUDHI-cover-complex.ipynb similarity index 100% rename from Tuto-GUDHI-cover-complex.ipynb rename to tutorials/Tuto-GUDHI-cover-complex.ipynb diff --git a/Tuto-GUDHI-cubical-complexes.ipynb b/tutorials/Tuto-GUDHI-cubical-complexes.ipynb similarity index 100% rename from Tuto-GUDHI-cubical-complexes.ipynb rename to tutorials/Tuto-GUDHI-cubical-complexes.ipynb diff --git a/Tuto-GUDHI-extended-persistence.ipynb b/tutorials/Tuto-GUDHI-extended-persistence.ipynb similarity index 100% rename from Tuto-GUDHI-extended-persistence.ipynb rename to tutorials/Tuto-GUDHI-extended-persistence.ipynb diff --git a/Tuto-GUDHI-kPDTM-kPLM.ipynb b/tutorials/Tuto-GUDHI-kPDTM-kPLM.ipynb similarity index 100% rename from Tuto-GUDHI-kPDTM-kPLM.ipynb rename to tutorials/Tuto-GUDHI-kPDTM-kPLM.ipynb diff --git a/Tuto-GUDHI-optimization.ipynb b/tutorials/Tuto-GUDHI-optimization.ipynb similarity index 99% rename from Tuto-GUDHI-optimization.ipynb rename to tutorials/Tuto-GUDHI-optimization.ipynb index 298e74b..a969c37 100644 --- a/Tuto-GUDHI-optimization.ipynb +++ b/tutorials/Tuto-GUDHI-optimization.ipynb @@ -363,7 +363,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can actually play the same game with images! Indeed, Gudhi contains code for computing [cubical persistence](http://www2.im.uj.edu.pl/mpd/publications/Wagner_persistence.pdf), which is very well-suited for handling images. For instance, it can be used to filter a 2D image with its pixel values. Overall, the optimization follows the exact same steps as before, except that we use pixel filtration instead of Rips filtration. This means that the parameters $\\theta$ that we are now going to optimize are the pixel values themselves, and that the gradients for positive simplex $\\nabla_\\theta f_\\theta(\\sigma_+(p))$ and negative simplex $\\nabla_\\theta f_\\theta(\\sigma_-(p))$ now simply equal $1$ for the pixels associated to $\\sigma_+(p)$ and $\\sigma_-(p)$ and $0$ for all other pixels. All of that is implemented in `CubicalLayer`." + "We can actually play the same game with images! Indeed, Gudhi contains code for computing [cubical persistence](https://www.vrvis.at/publications/pdfs/PB-VRVis-2011-010.pdf), which is very well-suited for handling images. For instance, it can be used to filter a 2D image with its pixel values. Overall, the optimization follows the exact same steps as before, except that we use pixel filtration instead of Rips filtration. This means that the parameters $\\theta$ that we are now going to optimize are the pixel values themselves, and that the gradients for positive simplex $\\nabla_\\theta f_\\theta(\\sigma_+(p))$ and negative simplex $\\nabla_\\theta f_\\theta(\\sigma_-(p))$ now simply equal $1$ for the pixels associated to $\\sigma_+(p)$ and $\\sigma_-(p)$ and $0$ for all other pixels. All of that is implemented in `CubicalLayer`." ] }, { diff --git a/Tuto-GUDHI-persistence-diagrams.ipynb b/tutorials/Tuto-GUDHI-persistence-diagrams.ipynb similarity index 100% rename from Tuto-GUDHI-persistence-diagrams.ipynb rename to tutorials/Tuto-GUDHI-persistence-diagrams.ipynb diff --git a/Tuto-GUDHI-persistent-entropy.ipynb b/tutorials/Tuto-GUDHI-persistent-entropy.ipynb similarity index 100% rename from Tuto-GUDHI-persistent-entropy.ipynb rename to tutorials/Tuto-GUDHI-persistent-entropy.ipynb diff --git a/Tuto-GUDHI-perslay-visu.ipynb b/tutorials/Tuto-GUDHI-perslay-visu.ipynb similarity index 100% rename from Tuto-GUDHI-perslay-visu.ipynb rename to tutorials/Tuto-GUDHI-perslay-visu.ipynb diff --git a/Tuto-GUDHI-representations.ipynb b/tutorials/Tuto-GUDHI-representations.ipynb similarity index 100% rename from Tuto-GUDHI-representations.ipynb rename to tutorials/Tuto-GUDHI-representations.ipynb diff --git a/Tuto-GUDHI-simplex-Trees.ipynb b/tutorials/Tuto-GUDHI-simplex-Trees.ipynb similarity index 100% rename from Tuto-GUDHI-simplex-Trees.ipynb rename to tutorials/Tuto-GUDHI-simplex-Trees.ipynb diff --git a/Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb b/tutorials/Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb similarity index 100% rename from Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb rename to tutorials/Tuto-GUDHI-simplicial-complexes-from-data-points.ipynb diff --git a/Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb b/tutorials/Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb similarity index 100% rename from Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb rename to tutorials/Tuto-GUDHI-simplicial-complexes-from-distance-matrix.ipynb diff --git a/datasets/Corr_ProteinBinding/1anf.corr_1.txt b/tutorials/datasets/Corr_ProteinBinding/1anf.corr_1.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1anf.corr_1.txt rename to tutorials/datasets/Corr_ProteinBinding/1anf.corr_1.txt diff --git a/datasets/Corr_ProteinBinding/1ez9.corr_1.txt b/tutorials/datasets/Corr_ProteinBinding/1ez9.corr_1.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1ez9.corr_1.txt rename to tutorials/datasets/Corr_ProteinBinding/1ez9.corr_1.txt diff --git a/datasets/Corr_ProteinBinding/1fqa.corr_2.txt b/tutorials/datasets/Corr_ProteinBinding/1fqa.corr_2.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1fqa.corr_2.txt rename to tutorials/datasets/Corr_ProteinBinding/1fqa.corr_2.txt diff --git a/datasets/Corr_ProteinBinding/1fqb.corr_3.txt b/tutorials/datasets/Corr_ProteinBinding/1fqb.corr_3.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1fqb.corr_3.txt rename to tutorials/datasets/Corr_ProteinBinding/1fqb.corr_3.txt diff --git a/datasets/Corr_ProteinBinding/1fqc.corr_2.txt b/tutorials/datasets/Corr_ProteinBinding/1fqc.corr_2.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1fqc.corr_2.txt rename to tutorials/datasets/Corr_ProteinBinding/1fqc.corr_2.txt diff --git a/datasets/Corr_ProteinBinding/1fqd.corr_3.txt b/tutorials/datasets/Corr_ProteinBinding/1fqd.corr_3.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1fqd.corr_3.txt rename to tutorials/datasets/Corr_ProteinBinding/1fqd.corr_3.txt diff --git a/datasets/Corr_ProteinBinding/1jw4.corr_4.txt b/tutorials/datasets/Corr_ProteinBinding/1jw4.corr_4.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1jw4.corr_4.txt rename to tutorials/datasets/Corr_ProteinBinding/1jw4.corr_4.txt diff --git a/datasets/Corr_ProteinBinding/1jw5.corr_5.txt b/tutorials/datasets/Corr_ProteinBinding/1jw5.corr_5.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1jw5.corr_5.txt rename to tutorials/datasets/Corr_ProteinBinding/1jw5.corr_5.txt diff --git a/datasets/Corr_ProteinBinding/1lls.corr_6.txt b/tutorials/datasets/Corr_ProteinBinding/1lls.corr_6.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1lls.corr_6.txt rename to tutorials/datasets/Corr_ProteinBinding/1lls.corr_6.txt diff --git a/datasets/Corr_ProteinBinding/1mpd.corr_4.txt b/tutorials/datasets/Corr_ProteinBinding/1mpd.corr_4.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1mpd.corr_4.txt rename to tutorials/datasets/Corr_ProteinBinding/1mpd.corr_4.txt diff --git a/datasets/Corr_ProteinBinding/1omp.corr_7.txt b/tutorials/datasets/Corr_ProteinBinding/1omp.corr_7.txt similarity index 100% rename from datasets/Corr_ProteinBinding/1omp.corr_7.txt rename to tutorials/datasets/Corr_ProteinBinding/1omp.corr_7.txt diff --git a/datasets/Corr_ProteinBinding/3hpi.corr_5.txt b/tutorials/datasets/Corr_ProteinBinding/3hpi.corr_5.txt similarity index 100% rename from datasets/Corr_ProteinBinding/3hpi.corr_5.txt rename to tutorials/datasets/Corr_ProteinBinding/3hpi.corr_5.txt diff --git a/datasets/Corr_ProteinBinding/3mbp.corr_6.txt b/tutorials/datasets/Corr_ProteinBinding/3mbp.corr_6.txt similarity index 100% rename from datasets/Corr_ProteinBinding/3mbp.corr_6.txt rename to tutorials/datasets/Corr_ProteinBinding/3mbp.corr_6.txt diff --git a/datasets/Corr_ProteinBinding/4mbp.corr_7.txt b/tutorials/datasets/Corr_ProteinBinding/4mbp.corr_7.txt similarity index 100% rename from datasets/Corr_ProteinBinding/4mbp.corr_7.txt rename to tutorials/datasets/Corr_ProteinBinding/4mbp.corr_7.txt diff --git a/datasets/ElongatedTorus.txt b/tutorials/datasets/ElongatedTorus.txt similarity index 100% rename from datasets/ElongatedTorus.txt rename to tutorials/datasets/ElongatedTorus.txt diff --git a/datasets/NoisyTrefoil180.txt b/tutorials/datasets/NoisyTrefoil180.txt similarity index 100% rename from datasets/NoisyTrefoil180.txt rename to tutorials/datasets/NoisyTrefoil180.txt diff --git a/datasets/crater_tuto b/tutorials/datasets/crater_tuto similarity index 100% rename from datasets/crater_tuto rename to tutorials/datasets/crater_tuto diff --git a/datasets/data_acc b/tutorials/datasets/data_acc similarity index 100% rename from datasets/data_acc rename to tutorials/datasets/data_acc diff --git a/datasets/diff/mnist_test.csv b/tutorials/datasets/diff/mnist_test.csv similarity index 100% rename from datasets/diff/mnist_test.csv rename to tutorials/datasets/diff/mnist_test.csv diff --git a/datasets/human.off b/tutorials/datasets/human.off similarity index 100% rename from datasets/human.off rename to tutorials/datasets/human.off diff --git a/datasets/human.txt b/tutorials/datasets/human.txt similarity index 100% rename from datasets/human.txt rename to tutorials/datasets/human.txt diff --git a/datasets/mnist_test.csv b/tutorials/datasets/mnist_test.csv similarity index 100% rename from datasets/mnist_test.csv rename to tutorials/datasets/mnist_test.csv diff --git a/datasets/tore3D_1307.off b/tutorials/datasets/tore3D_1307.off similarity index 100% rename from datasets/tore3D_1307.off rename to tutorials/datasets/tore3D_1307.off diff --git a/datasets/trefoil_dist b/tutorials/datasets/trefoil_dist similarity index 100% rename from datasets/trefoil_dist rename to tutorials/datasets/trefoil_dist diff --git a/utils/KeplerMapperVisuFromTxtFile.py b/tutorials/utils/KeplerMapperVisuFromTxtFile.py similarity index 100% rename from utils/KeplerMapperVisuFromTxtFile.py rename to tutorials/utils/KeplerMapperVisuFromTxtFile.py diff --git a/utils/broken_links_scraper.py b/tutorials/utils/broken_links_scraper.py similarity index 100% rename from utils/broken_links_scraper.py rename to tutorials/utils/broken_links_scraper.py diff --git a/persistence_statistics.py b/tutorials/utils/persistence_statistics.py similarity index 51% rename from persistence_statistics.py rename to tutorials/utils/persistence_statistics.py index 16e867b..60e8301 100644 --- a/persistence_statistics.py +++ b/tutorials/utils/persistence_statistics.py @@ -1,118 +1,104 @@ -def hausd_interval(data, level = 0.95, m=-1, B =1000,pairwise_dist = False, - leaf_size = 2,ncores = None): - - ''' - Subsampling Confidence Interval for the Hausdorff Distance between a +def hausd_interval(data, level=0.95, m=-1, B=1000, pairwise_dist=False, leaf_size=2, ncores=None): + """ + Subsampling Confidence Interval for the Hausdorff Distance between a Manifold and a Sample Fasy et al AOS 2014 - + Input: data : a nxd numpy array representing n points in R^d, or a nxn matrix of pairwise distances - m : size of each subsample. If m=-1 then m = n / np.log(n) - B : number of subsamples + m : size of each subsample. If m=-1 then m = n / np.log(n) + B : number of subsamples level : confidence level pairwise_dist : if pairwise_dist = True then data is a nxn matrix of pairwise distances leaf_size : leaf size for KDTree ncores : number of cores for multiprocessing (if None then the maximum number of cores is used) - - Output: + + Output: quantile for the Hausdorff distance - - - ''' - - + + + """ + import numpy as np from multiprocessing import Pool from sklearn.neighbors import KDTree - - # sample size - n = np.size(data,0) - + n = np.size(data, 0) + # subsample size if m == -1: - m = int (n / np.log(n)) - - + m = int(n / np.log(n)) + # Data is an array if pairwise_dist == False: - + # for subsampling - # a reprendre sans shuffle slit + # a reprendre sans shuffle slit - global hauss_dist + def hauss_dist(m): - ''' + """ Distances between the points of data and a random subsample of data of size m - ''' - I = np.random.choice(n,m) + """ + I = np.random.choice(n, m) Icomp = [item for item in np.arange(n) if item not in I] - tree = KDTree(data[I,],leaf_size=leaf_size) - dist, ind = tree.query(data[Icomp,],k=1) + tree = KDTree(data[I,], leaf_size=leaf_size) + dist, ind = tree.query(data[Icomp,], k=1) hdist = max(dist) - return(hdist) - + return hdist + # parrallel computing with Pool(ncores) as p: - dist_vec = p.map(hauss_dist,[m]*B) + dist_vec = p.map(hauss_dist, [m] * B) p.close() - dist_vec = [a[0] for a in dist_vec] - - - # Data is a matrix of pairwise distances + dist_vec = [a[0] for a in dist_vec] + + # Data is a matrix of pairwise distances else: + def hauss_dist(m): - ''' + """ Distances between the points of data and a random subsample of data of size m - ''' - I = np.random.choice(n,m) - hdist= np.max([np.min(data[I,j]) for j in np.arange(n) if j not in I]) - return(hdist) - + """ + I = np.random.choice(n, m) + hdist = np.max([np.min(data[I, j]) for j in np.arange(n) if j not in I]) + return hdist + # parrallel computing with Pool(ncores) as p: - dist_vec = p.map(hauss_dist, [m]*B) + dist_vec = p.map(hauss_dist, [m] * B) p.close() - - + # quantile and confidence band myquantile = np.quantile(dist_vec, level) - c = 2 * myquantile - - return(c) - - + c = 2 * myquantile + return c +def truncated_simplex_tree(st, int_trunc=100): + """ + This function return a truncated simplex tree - -def truncated_simplex_tree(st,int_trunc=100): - ''' - This function return a truncated simplex tree - Input: st : a simplex tree int_trunc : number of persistent interval keept per dimension (the largest) - + Ouptut: - st_trunc_pers : truncated simplex tree - ''' - - st.persistence() + st_trunc_pers : truncated simplex tree + """ + + st.persistence() dim = st.dimension() - st_trunc_pers = []; + st_trunc_pers = [] for d in range(dim): pers_d = st.persistence_intervals_in_dimension(d) - d_l= len(pers_d) + d_l = len(pers_d) if d_l > int_trunc: - pers_d_trunc = [pers_d[i] for i in range(d_l-int_trunc,d_l)] + pers_d_trunc = [pers_d[i] for i in range(d_l - int_trunc, d_l)] else: pers_d_trunc = pers_d - st_trunc_pers = st_trunc_pers + [(d,(l[0],l[1])) for l in pers_d_trunc] - return(st_trunc_pers) - - + st_trunc_pers = st_trunc_pers + [(d, (l[0], l[1])) for l in pers_d_trunc] + return st_trunc_pers diff --git a/utils/utils_dtm.py b/tutorials/utils/utils_dtm.py similarity index 100% rename from utils/utils_dtm.py rename to tutorials/utils/utils_dtm.py diff --git a/utils/utils_epd.py b/tutorials/utils/utils_epd.py similarity index 100% rename from utils/utils_epd.py rename to tutorials/utils/utils_epd.py diff --git a/utils/utils_quantization.py b/tutorials/utils/utils_quantization.py similarity index 100% rename from utils/utils_quantization.py rename to tutorials/utils/utils_quantization.py