This project developed a machine learning solution for predicting gold recovery at the rougher and final stages of ore processing using datasets with over 80 parameters. A Multi-Output Random Forest Regression model provided the most accurate predictions during training, with Linear Regression as a viable, less computationally intensive alternative. Despite underperforming compared to constant benchmarks on the test set, the models demonstrate the potential for data-driven optimization of industrial processes.
π Python π©π½βπ» Data Science π€ Machine Learning π§ͺ Scikit Learn β Cross Validation πΌ pandas π Data Analytics π Supervised Learning βοΈ Feature Engineering π― Model Evaluation π΅π½ββοΈ Anomaly Detection π§Ό Data Cleaning and Preprocessing
- This project uses pandas, numpy, RandomForestRegressor, MultiOutputRegressor, LinearRegression, mean_squared_error, mean_absolute_error, make_scorer, matplotlib.pyplot, shuffle, StandardScaler, seaborn, SimpleImputer, cross_val_score, KFold, and RandomizedSearchCV. It requires python 3.9.6. There is one additional file containing the full, unsplit test set that I was unable to upload due to upload limitations.