Welcome to the Forest Cover Type Classifier repository! This project uses a TensorFlow-based neural network to predict forest cover types from the UCI Covertype dataset. π³π Whether you're into machine learning, environmental data, or just love forests, this notebook has you covered! We preprocess data, train a multi-layer perceptron, evaluate performance, test random samples, and plot ROC curves for insightful analysis. π
- Dataset: UCI Forest Covertype (581,012 samples, 54 features, 7 classes like Spruce/Fir, Lodgepole Pine, etc.) πΏ
- Model: Sequential DNN with Dense layers, Dropout for regularization, and Softmax output. Trained with Adam optimizer and categorical cross-entropy. π§
- Key Features:
- Data splitting & standardization π
- Batch training with TensorFlow Datasets β‘
- Accuracy evaluation (~85% on test set) β
- Random sample prediction testing π²
- Multi-class ROC curve visualization π
- Tech Stack: TensorFlow, Scikit-learn, Matplotlib, NumPy π οΈ
-
Clone the repo:
git clone https://github.com/shervinnd/forest-cover-type-classifier.git cd forest-cover-type-classifier -
Install dependencies (use a virtual environment like venv or conda):
pip install tensorflow numpy matplotlib scikit-learn
-
Open the Jupyter Notebook:
jupyter notebook covtype.ipynb
Note: This was tested on Python 3.12 with GPU acceleration (T4). Ensure TensorFlow is GPU-enabled if needed! βοΈ
- Run the Notebook: Execute cells sequentially to:
- Import libraries π
- Load & preprocess data (fetch_covtype, scaling, one-hot encoding) π
- Build & compile the model ποΈ
- Train for 20 epochs (batch size 128) β±οΈ
- Evaluate on test set π
- Test a random sample π―
- Generate ROC curves for each class π
- Customize:
- Tweak hyperparameters like epochs, batch size, or layers in the parameters cell. π§
- Run
test_random_sample()multiple times for fun predictions! π
- Output Example:
- Training logs show accuracy improving to ~81% on train, ~85% on validation.
- ROC AUCs: High for most classes (e.g., 0.99+ for some)! π
- Test Accuracy: ~85.23% π
- Sample Prediction: Picks a random test instance, predicts cover type (e.g., Lodgepole Pine), shows probabilities, and checks correctness. β /β
- ROC Curves: Visualizes model confidence per class -- great for multi-class imbalance analysis! π (Plotted with Matplotlib)
- Pro Tip: Classes like Cottonwood/Willow might have lower AUC due to fewer samples. Experiment with oversampling! βοΈ
We'd love your input! π
- Fork the repo & create a pull request.
- Suggestions: Add more models (e.g., CNNs, XGBoost), hyperparameter tuning with Keras Tuner, or deployment with Streamlit. π‘
- Report issues or bugs via GitHub Issues. π
This project is licensed under the MIT License -- feel free to use, modify, and share! π
Powered by Miracle β‘ -- Exploring forests one prediction at a time! π²