📖 Manuscript DOI: Does machine learning outperform logistic regression in predicting individual tree mortality?
- 6 different Machine Learning algorithms were compared in predicting individual tree mortality.
- Effects of dataset size, variable set, thinning, inventory length, and cross-validation were studied.
- Random Forest reached a higher performance level in all the case studies proposed except on cross-validation.
- Logistic binomial Regression seems to be a more robust algorithm regarding cross-validation.
Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought, and pathogens drive tree mortality, but the underlying mechanism is challenging to model. The current environmental changes are even complicating model approaches as they influence and alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns that can model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality. However, different machine learning (ML) alternatives already applied to other forest modeling topics can be used for this purpose. Here, we compare the performance of five different ML algorithms (Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine) against Logistic binomial Regression in individual tree mortality classification under 40 different case studies and a cross-validation case study. The data used corresponds to Norway spruce long-term experimental plots, which have a total of 75,522 tree records and a 10.28% mortality rate on average. Through different case studies, when more variables were used, general performance improved as expected, while more extensive datasets decreased the performance level of the algorithms. Performance was also higher when plots remained without management compared to thinned ones. Random Forest outperformed the other algorithms in all the cases except cross-validation, where it was the weaker one. Our results demonstrate the potential of ML in assessing tree mortality. When the model application is not clearly defined and/or model interpretability is needed, Logistic binomial Regression is still the best tool for evaluating individual tree mortality.
- 📂 1_data: raw and processed data, check here for a detailed description
- 📂 2_code: compilation of the code used for data curation, analysis and outputs included in the document, check here for a detailed description
- 📂 3_figures: figures, charts, tables and additional resources included in the document, check here for a detailed description
- 📂 4_bibliography: compilation of all the literature cited or consulted during the creation of the document
💫 To download the information of that repository, you can follow this guide.
♻️ To reproduce the analysis, users must:
-
💾 Data:
- WorldClim data required for the simulations must be downloaded from its official website
-
💻 Prerequisites: installation and code: R must be installed to run the code with the used libraries across each script (RStudio was also used to develop the code). Some analyses (specifically when training RF models) will request high computation power, which can provoke out-of-memory in a normal computer. Access to high-computing services is highly recommended in those cases.
-
📜 Usage: follow the numerical order of the scripts to reproduce each step correctly
The content of this repository is under the MIT license.
You can use the citation file or copy the citation directly into APA or BibTeX using the bottom Cite this repository on the right hand side of the repository content, here are more details.
Does Machine Learning outperform Logistic Regression in predicting individual tree mortality?





