Forest Cover Type Classification

a. Problem Statement

The objective of this assignment is to implement multiple machine learning classification models to predict the Forest Cover Type based on cartographic and environmental features.

This project includes:

Implementation of six classification models
Evaluation using multiple performance metrics
Model comparison
Deployment using Streamlit

This is a multi-class classification problem with 7 classes.

b. Dataset Description

Dataset Name: Covertype Dataset (UCI Machine Learning Repository)
Total Instances: 581,012
Total Features: 54
Target Variable: Cover_Type
Number of Classes: 7

The dataset includes environmental and cartographic attributes such as elevation, slope, soil type, and wilderness area indicators used to classify forest cover type.

c. Models Used

Logistic Regression
Decision Tree Classifier
K-Nearest Neighbors (KNN)
Naive Bayes (Gaussian)
Random Forest (Ensemble)
XGBoost (Ensemble Boosting)

Model Comparison Table

ML Model	Accuracy	AUC	Precision	Recall	F1 Score	MCC
Logistic Regression	0.7235	0.9363	0.7109	0.7235	0.7138	0.5469
Decision Tree	0.9389	0.9452	0.9389	0.9389	0.9389	0.9019
KNN	0.9284	0.9839	0.9282	0.9284	0.9282	0.8849
Naive Bayes	0.0895	0.7970	0.4940	0.0895	0.0577	0.0688
Random Forest	0.9533	0.9978	0.9534	0.9533	0.9530	0.9248
XGBoost	0.8682	0.9862	0.8683	0.8682	0.8674	0.7871

Observations

ML Model	Observation
Logistic Regression	Moderate performance. Struggles with complex non-linear patterns.
Decision Tree	Strong performance. Captures non-linear relationships effectively but may overfit slightly.
KNN	Performs well after scaling. Distance-based learning suits this dataset.
Naive Bayes	Very poor performance due to unrealistic feature independence assumptions.
Random Forest	Best performing model overall with highest accuracy and MCC.
XGBoost	Strong AUC performance but slightly lower accuracy than Random Forest.

Streamlit Application Features

CSV dataset upload option
Model selection dropdown
Display of evaluation metrics
Confusion matrix visualization
Classification report display

Technologies Used

Python, Scikit-learn, XGBoost, Pandas, NumPy, Seaborn, Matplotlib, Streamlit

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
model_comparison.csv		model_comparison.csv
requirements.txt		requirements.txt
sample_test_data.csv		sample_test_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forest Cover Type Classification

a. Problem Statement

b. Dataset Description

c. Models Used

Model Comparison Table

Observations

Streamlit Application Features

Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Forest Cover Type Classification

a. Problem Statement

b. Dataset Description

c. Models Used

Model Comparison Table

Observations

Streamlit Application Features

Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages