Skip to content

aitorvv/ML_individual_tree_mortality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Does Machine Learning outperform Logistic Regression in predicting individual tree mortality?

💻 💾 📊 Original data, code and results related to the study


📂 Repository DOI: DOI


✨ Highlights

  • 6 different Machine Learning algorithms were compared in predicting individual tree mortality.
  • Effects of dataset size, variable set, thinning, inventory length, and cross-validation were studied.
  • Random Forest reached a higher performance level in all the case studies proposed except on cross-validation.
  • Logistic binomial Regression seems to be a more robust algorithm regarding cross-validation.

📖 Abstract

Tree mortality is a crucial process in forest dynamics and a key component of forest growth models and simulators. Factors like competition, drought, and pathogens drive tree mortality, but the underlying mechanism is challenging to model. The current environmental changes are even complicating model approaches as they influence and alter all the factors involving mortality. However, innovative classification algorithms can go deep into data to find patterns that can model or even explain their relationship. We use Logistic binomial Regression as the reference algorithm for predicting individual tree mortality. However, different machine learning (ML) alternatives already applied to other forest modeling topics can be used for this purpose. Here, we compare the performance of five different ML algorithms (Decision Trees, Random Forest, Naive Bayes, K-Nearest Neighbour, and Support Vector Machine) against Logistic binomial Regression in individual tree mortality classification under 40 different case studies and a cross-validation case study. The data used corresponds to Norway spruce long-term experimental plots, which have a total of 75,522 tree records and a 10.28% mortality rate on average. Through different case studies, when more variables were used, general performance improved as expected, while more extensive datasets decreased the performance level of the algorithms. Performance was also higher when plots remained without management compared to thinned ones. Random Forest outperformed the other algorithms in all the cases except cross-validation, where it was the weaker one. Our results demonstrate the potential of ML in assessing tree mortality. When the model application is not clearly defined and/or model interpretability is needed, Logistic binomial Regression is still the best tool for evaluating individual tree mortality.


📁 Repository Contents

  • 📂 1_data: raw and processed data, check here for a detailed description
  • 📂 2_code: compilation of the code used for data curation, analysis and outputs included in the document, check here for a detailed description
  • 📂 3_figures: figures, charts, tables and additional resources included in the document, check here for a detailed description
  • 📂 4_bibliography: compilation of all the literature cited or consulted during the creation of the document

🤔 How to use the resouces of that repository

💫 To download the information of that repository, you can follow this guide.

♻️ To reproduce the analysis, users must:

  • 💾 Data:

    • WorldClim data required for the simulations must be downloaded from its official website
  • 💻 Prerequisites: installation and code: R must be installed to run the code with the used libraries across each script (RStudio was also used to develop the code). Some analyses (specifically when training RF models) will request high computation power, which can provoke out-of-memory in a normal computer. Access to high-computing services is highly recommended in those cases.

  • 📜 Usage: follow the numerical order of the scripts to reproduce each step correctly


🔗 About the authors

Aitor Vázquez Veloso:

Email ORCID Google Scholar ResearchGate LinkedIn X Description

Astor Toraño Caicoya:

Description

ORCID ResearchGate LinkedIn Description

Felipe Bravo Oviedo:

ORCID ResearchGate LinkedIn X Description

Peter Biber:

Description

ORCID ResearchGate Description

Enno Uhl:

Description

ORCID ResearchGate

Hans Pretzsch:

Description

ORCID ResearchGate Description


ℹ License

MIT License

The content of this repository is under the MIT license.


📝 How to cite this repository?

You can use the citation file or copy the citation directly into APA or BibTeX using the bottom Cite this repository on the right hand side of the repository content, here are more details.


Does Machine Learning outperform Logistic Regression in predicting individual tree mortality?

About

Code, data and resources for "Does Machine Learning outperform Logistic Regression in predicting individual tree mortality?"

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors