This repository contains the R code for implementing the GscanStat clustering algorithm and reproducing the results from the case study described in the paper “Improving Disease Risk Estimation in Small Areas by Accounting for Spatio-Temporal Local Discontinuities” (Santafé et al., 2025).
Overall cancer (all sites) mortality data for the male population in continental Spain (excluding the Balearic and Canary Islands) are provided. The data are aggregated into 3-year periods spanning 1999–2022 (i.e., 1999–2001, 2002–2004, …, 2020–2022).
In addition, to avoid introducing substantial variability that could hinder the performance of the GscanStat algorithm in detecting clusters, neighboring municipalities were aggregated (while respecting the administrative boundaries of the Autonomous Communities) until each resulting spatial unit contained at least 16 observed cases over the entire study period.
The DataSpain_Cancer.Rdata file contains both the data and the cartographic information corresponding to the final configuration, which comprises 2.470 regions.
This .Rdata file contains the following objects:
-
carto
:sf
object containing the polygon geometries of the spatial units -
data
:tibble
object with 19.760 rows and 5 columnsID
: character vector with the IDs of the spatial unitsyear
: numeric vector representing the time period (1=1999-2001, ..., 8=2020-2022)obs
: observed number of cancer deathsexp
: expected number of cancer deaths (calculated using internal age-standardization)pop
: population at risk
Here we provide R code to fit the following spatio-temporal models:
-
SaTScan model (Kulldorff, 2021)
-
GscanStat model (Santafé et al., 2025)
The bigDM
package is used to fit local spatio-temporal models through a divide-and-conquer strategy (Orozco-Acosta et al., 2023), incorporating the significant risk clustering structure identified by the GscanStat algorithm. Version 0.5.7 of bigDM
has been developed specifically for this purpose. All models were fitted using the R-INLA stable version 25.06.07 on R-4.5.1.
Two examples are provided:
-
Example1_GscanStat_Navarre.R illustrates a small-scale data analysis for the Autonomous Region of Navarre (58 areas across 8 time points). First, the GscanStat and SaTScan algorithms are applied to detect significant clusters. Subsequently, spatio-temporal models with BYM2 spatial prior, RW1 temporal prior and Type IV interaction are fitted to estimate relative risks.
-
Example2_GscanStat_Spain.R demonstrates a large-scale data analysis for the entire Spanish regions (2.470 areas across 8 time points). First, the GscanStat and SaTScan algorithms are applied to detect significant clusters. Subsequently, local spatio-temporal models are fitted using the divide-and-conquer strategy implemented in the bigDM package to estimate relative risks.
WARNING: This analysis is computationally intensive. As a reference, obtaining the final clustering partition with the GscanStat algorithm takes approximately 12.2 hours on an Intel(R) Xeon(R) Silver 4316 processor with 80 CPUs at 2.30 GHz and 256 GB of RAM.
Additional scripts:
-
GscanStat_parallelClusteringAlgorithm.R contains the functions needed to run the Greedy Scan Statistics (GscanStat) algorithm for cluster detection.
-
SaTScan_auxFunctions.R provides auxiliary functions to running the SaTScan software via the
rsatscan
package, which requires both the package itself and the standalone SaTScan software (https://www.satscan.org/download.html).
This work has been supported by project PID2020-113125RB-I00/MCIN/AEI/10.13039/501100011033 (Spanish Ministry of Science, Innovation and Universities, AEI).