diff --git a/README.md b/README.md index 924b83b..2727bb2 100755 --- a/README.md +++ b/README.md @@ -1,41 +1,21 @@ # Earthquakes_Italy -This is the repository for an extended case study of earthquake forecasting in +This repository provides the source code (_R_ language) for an extended case study of earthquake forecasting in Italy: -Jonas Brehmer, Kristof Kraus, Tilmann Gneiting, Marcus Herrmann, and -Warner Marzocchi (2024+). Comparative evaluation of earthquake forecasting -models: An application to Italy. +> Brehmer, J. R., Kraus, K., Gneiting, T., Herrmann, M., and Marzocchi, W. (2025). +> Enhancing the Statistical Evaluation of Earthquake Forecasts—An Application to Italy. +> _Seismological Research Letters_. +> doi: [10.1785/0220240209](https://doi.org/10.1785/0220240209). +> arXiv: [2405.10712](https://arxiv.org/abs/2405.10712) -Parts of this case study and the corresponding code have been used in +Parts of this case study and the corresponding code have been also used in: -Jonas Brehmer, Tilmann Gneiting, Marcus Herrmann, Warner Marzocchi, Martin -Schlather, and Kirstin Strokorb (2023). Comparative evaluation of point -process forecasts. - -## Differences to previously reported predictive performance - -We report slightly different Poisson scores compared to - -J. Brehmer, T. Gneiting, M. Herrmann, W. Marzocchi, M. -Schlather, and K. Strokorb (2023). Comparative evaluation of point -process forecasts. - -Due to numerical inaccuracies three earthquakes had been unnecessarily excluded -from analysis (these were the earthquake going with the timestamp 2016-05-30 20:24:20.460, -2016-08-26 04:28:25.890, and 29019-10-25 04:31:38.200). -Additionally, we adhere to the CSEP binning which includes lower boundaries to a cell and -excludes upper boundaries, whereas Brehmer et al. used the opposite rule. This results in four -earthquakes being assigned to a neighboring cells. - -Likewise, we report different IGPE values compared to - -M. Herrmann and W. Marzocchi (2023). Maximizing the forecasting skill of an ensemble -model. Geophysical Journal International. - -The reason is again a slightly different binning of earthquakes registered on cell boundaries. -While Herrmann et al. also use the CSEP binning, numerical inaccuracies entailed that five -earthquakes had been assigned to a wrong cell in terms of the CSEP binning. +> Brehmer, J. R., Gneiting, T., Herrmann, M., Marzocchi, W., Schlather, M., and Strokorb, K. (2024). +> Comparative evaluation of point process forecasts. +> _Annals of the Institute of Statistical Mathematics 76_(1), 47–71. +> doi: [10.1007/s10463-023-00875-5](https://doi.org/10.1007/s10463-023-00875-5). +> arXiv: [2103.11884](https://arxiv.org/abs/2103.11884) ## Code @@ -44,7 +24,7 @@ The tables and plots can be reproduced with the following files: - **data_prep.R** Loads all the data (see below) and pre-processes them. Called at the start of all other main scripts. - **functions_prep.R** Provides functions to load and preprocess the data -- **precompute.R** pre-compute values for empirical CDFs, Murphy +- **precompute.R** pre-compute values for empirical CDFs, Murphy diagrams, score component plot and reliability diagram - **plots_and_tables.R** comprises all the functionality to calculate the values for the tables and to create the plots of the publication apart from simulation study plots @@ -53,8 +33,10 @@ on its reliability curve - **sim_study_tests.R** compare CSEP t-Test and Diebold Mariano test on forecasts derived from the LM, FCM and LG model - **utils.R** define plot theme, and scoring and test functions +The code was developed on a machine with 16GB of RAM. This is required if for the analysis all the models are +kept in memory simultaneously. -## Data +## Data input - **models** (four files): Two-dimensional arrays which contain the forecasts of the models. The rows represent different model run times. The columns represent @@ -67,10 +49,67 @@ degrees and are numbered consecutively. - **catalog** file: Contains the details (time stamp, magnitude, etc.) of earthquakes in the testing region and during the testing period (but time and space of model forecasts and catalog do not exactly match) -- **climatology** file: Rates for a selection of longitude and latitude values. If +- **climatology** file: Rates for a selection of longitude and latitude values. If appropriately scaled, they can be understood as climatological forecast which are constant in time. The scaling depends on the assumed number of events in a 7-day period, see Mail by Warner Marzocchi (08.09.21) -The code was developed on a machine with 16GB of RAM. This is required if for the analysis all the models are -kept in memory simultaneously. + +## Details on slight differences to previously reported scores + +Compared to two previous studies, we report slightly different scores due to correcting the +spatial binning of earthquakes that occurred exactly on grid cell boundaries (bin edges). +These binning discrepancies originate from numerical inaccuracies in floating-point arithmetics. +Of the 262 target earthquakes in the catalog, this affects seven events. + + +**1. Brehmer _et al._ 2024** + +Poisson scores in Table 1–3 slightly differ compared to Table 1 in Brehmer _et al._ 2024 (see second reference above). + +Three target earthquakes previously had been unnecessarily excluded from analysis: +``` +DateTime, Lat, Lon, Depth, Mag +2016-05-30 20:24:20.460, 42.7, 11.976, 7.9, 4.1 +2016-08-26 04:28:25.890, 42.6, 13.29, 10.9, 4.8 +2019-10-25 04:31:38.200, 39.7, 15.432, 12.1, 4.4 +``` +(their _Latitude_ component falls exactly on a cell boundary) + +Additionally, we now adhere to [_pyCSEP_-style binning](https://docs.cseptesting.org/reference/generated/csep.utils.calc.bin1d_vec.html) +which includes lower boundaries to a cell and excludes upper boundaries, +whereas Brehmer _et al._ 2024 used the opposite rule. +This resulted in four earthquakes being incorrectly assigned to a neighboring cell: +``` +DateTime, Lat, Lon, Depth, Mag +2006-02-27 04:34:01.830, 38.155, 15.2, 9.2, 4.1 +2009-04-06 01:42:49.970, 42.3, 13.429, 10.5, 4.2 +2016-12-09 07:21:50.170, 44.33, 10.5, 7.6, 4.0 +2016-12-11 12:54:52.860, 42.9, 13.113, 8.3, 4.3 +``` +(either the _Latitude_ or _Longitude_ component falls exactly on a cell boundary) + + +**2. Herrmann & Marzocchi 2023** + +IGPE values in Table 3 slightly differ compared to Table 1 in: + +> Herrmann, M., and W. Marzocchi (2023). +> Maximizing the forecasting skill of an ensemble model. +> _Geophysical Journal International 234_(1), 73–87. +> doi: [10.1093/gji/ggad020](https://doi.org/10.1093/gji/ggad020) + +(when transforming reported values to use _ETAS_LM_ as reference instead of _SMA_). + +Herrmann & Marzocchi 2023 used [_pyCSEP_'s binning function](https://docs.cseptesting.org/reference/generated/csep.utils.calc.bin1d_vec.html), +but it previously [didn't properly account for floating point precision](https://github.com/SCECcode/pycsep/issues/255), +resulting in five earthquakes being incorrectly assigned to a neighboring cell: +``` +DateTime, Lat, Lon, Depth, Mag +2009-04-06 01:42:49.970, 42.3, 13.429, 10.5, 4.2 +2016-05-30 20:24:20.460, 42.7, 11.976, 7.9, 4.1 +2016-08-26 04:28:25.890, 42.6, 13.29, 10.9, 4.8 +2016-12-11 12:54:52.860, 42.9, 13.113, 8.3, 4.3 +2019-10-25 04:31:38.200, 39.7, 15.432, 12.1, 4.4 +``` +(only their _Latitude_ component falls exactly on a cell boundary; those five events were already mentioned above)