A comprehensive machine learning project that predicts FIFA player overall ratings using advanced ensemble methods and provides an interactive web interface for real-time predictions.
- Advanced ML Pipeline: Complete data preprocessing with imputation and standardization
- Ensemble Methods: Random Forest, XGBoost, and Gradient Boosting regressors
- Hyperparameter Tuning: Automated optimization using RandomizedSearchCV
- Cross-Validation: Robust model evaluation with 3-fold cross-validation
- Interactive Web App: Streamlit-based interface with visual star ratings
- Model Persistence: Trained models saved using joblib for deployment
The project uses FIFA player datasets:
- Training Data:
male_players (legacy).csv - Testing Data:
players_22-1.csv(different season for validation)
- Movement Reactions
- Mentality Composure
- Passing & Dribbling
- Physical Attributes
- Shooting & Shot Power
- Age and other performance metrics
- Python 3.7+
- Machine Learning: scikit-learn, XGBoost
- Data Processing: pandas, numpy
- Model Persistence: joblib
- Web Interface: Streamlit
- Visualization: Built-in Streamlit components
pip install streamlit pandas scikit-learn joblib xgboost numpy scipygit clone https://github.com/Ama-Annor/AMAANNOR._SportsPrediction.git
cd AMAANNOR._SportsPredictionpip install -r requirements.txtRun the Jupyter notebook or Python script to train the model:
python AMAANNOR._SportsPrediction.pyThis will:
- Clean and preprocess the data
- Train multiple ML models
- Perform hyperparameter tuning
- Save the best model as
model_best.pkl - Save the scaler as
scaler.pkl
streamlit run player_rating_app.py- Open your browser to the Streamlit app (usually
http://localhost:8501) - Adjust player attribute sliders
- Enter actual rating for confidence calculation
- Click "Predict" to see results with star ratings
- Feature Selection: Focus on 11 most correlated features with player rating
- Missing Value Imputation: Median-based imputation using SimpleImputer
- Standardization: StandardScaler for feature normalization
- Data Cleaning: Remove non-numeric columns and handle missing values
# Models used
- Random Forest Regressor
- XGBoost Regressor
- Gradient Boosting Regressor (Best performing)
# Evaluation Metrics
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- RΒ² Score
- Cross-validation scores- Method: RandomizedSearchCV
- Parameters: n_estimators, learning_rate, max_depth, min_samples_split, etc.
- Cross-validation: 3-fold CV
- Scoring: Negative Mean Squared Error
- Random Forest: Mean CV Score: ~0.85
- XGBoost: Mean CV Score: ~0.87
- Gradient Boosting: Mean CV Score: ~0.89 (Best)
- RMSE: ~0.15-0.20 (on scaled data)
- MAE: ~0.12-0.18
- RΒ² Score: ~0.89-0.92
- Slider Controls: Easy adjustment of player attributes
- Real-time Predictions: Instant rating calculation
- Visual Feedback: Star-based rating system (βββββ)
- Confidence Metric: Shows prediction reliability
def compute_stars(rating):
# Converts numerical rating to 5-star scale
# Includes half-stars and visual representationsAMAANNOR._SportsPrediction/
βββ AMAANNOR._SportsPrediction.ipynb # Main training notebook
βββ AMAANNOR._SportsPrediction.py # Python script version
βββ player_rating_app.py # Streamlit web application
βββ model_best.pkl # Trained model (generated)
βββ scaler.pkl # Data scaler (generated)
βββ requirements.txt # Python dependencies
βββ male_players (legacy).csv # Training dataset
βββ players_22-1.csv # Testing dataset
βββ README.md # Project documentation
GradientBoostingRegressor(
n_estimators=300,
learning_rate=0.1,
max_depth=6,
min_samples_split=5,
min_samples_leaf=2,
subsample=0.8
)Top contributing factors to player ratings:
- Movement Reactions
- Mentality Composure
- Passing Ability
- Dribbling Skills
- Physical Attributes
- Tested on different season data (players_22)
- Maintains consistent performance across datasets
- Robust to new, unseen player data
- Deep learning models (Neural Networks)
- Player position-specific models
- Time series analysis for rating changes
- Advanced feature engineering
- Model interpretability with SHAP values
- API deployment for mobile apps
- Real-time data integration
Create a requirements.txt file:
streamlit>=1.28.0
pandas>=1.5.0
scikit-learn>=1.3.0
joblib>=1.3.0
xgboost>=1.7.0
numpy>=1.24.0
scipy>=1.10.0
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-feature) - Commit changes (
git commit -am 'Add new feature') - Push to branch (
git push origin feature/new-feature) - Create a Pull Request
Ama-Annor
- GitHub: @Ama-Annor
- FIFA for providing comprehensive player statistics
- scikit-learn community for excellent ML tools
- Streamlit team for the intuitive web framework
- XGBoost developers for high-performance gradient boosting
β½ Predict like a pro with data science! π