This notebook explores a Spotify dataset to predict listening behavior based on track features.
- The dataset was cleaned, preprocessed, and standardized using StandardScaler to improve model performance.
- Multiple machine learning models were trained and compared, including Logistic Regression, Decision Trees, Random Forest, KNN, XGBoost, and others.
- Accuracy and cross-validation were used to evaluate performance.
- Random Forest consistently gave the highest accuracy and balanced performance compared to other models.
- Optuna was used to fine-tune Random Forest parameters (
n_estimatorsandmax_depth). - Cross-validation ensured the model was not overfitting.
- Best Model: Random Forest Classifier
- Best Accuracy: 0.9892
- Best Parameters:
n_estimators = 91,max_depth = 27
✅ The project successfully identified Random Forest as the optimal model and optimized it for strong predictive performance.