A Data Science project that analyzes Metacritic video game review data and builds machine learning models to predict critic scores.
Author: Mushtaque Ali | CMS-ID: 023-23-0165 | Course: Data Science
The video game industry is one of the fastest-growing entertainment sectors worldwide. This project applies the complete Data Science lifecycle to analyze video game review data and build a predictive model that estimates Metacritic scores based on available features.
- Understand trends in video game ratings
- Explore relationships between critic scores and user reviews
- Build a regression model to predict Metacritic scores
- Evaluate and compare model performance
| Property | Detail |
|---|---|
| Source | Metacritic Video Game Reviews (Public Dataset) |
| Rows | 18,800 |
| Columns | 6 |
| Time Period | 1995 – 2021 |
| Platforms | 22 |
| Column | Description |
|---|---|
name |
Name of the video game |
platform |
Gaming platform (PS, Xbox, PC, etc.) |
release_date |
Release date of the game |
summary |
Short description of the game |
meta_score |
Metacritic critic score (Target variable) |
user_review |
Average user review score |
- Language: Python
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
- Notebook: Jupyter Notebook
- Models: Linear Regression, Ridge Regression, Random Forest, Gradient Boosting
Data Collection → Data Cleaning → EDA → Feature Engineering → Model Building → Evaluation
- Removed missing values from
meta_scoreanduser_review - Converted
user_reviewto numeric datatype - Removed non-numeric / invalid entries
- Encoded
platformusing Label Encoding - Extracted
yearandmonthfromrelease_date - Dropped unnecessary text columns for modeling
- Average Meta Score: 70.7 | Median: 72.0
- Average User Review: 6.99 | Median: 7.30
- Critic-User Correlation: 0.526
- Most games score between 60–80 on Metacritic
- Both critic and user scores show a declining trend over time
- PC has the most games (4,829), followed by PlayStation 4 (2,039)
| Rank | Platform | Avg Meta Score |
|---|---|---|
| 1 | Nintendo 64 | 78.4 |
| 2 | Xbox Series X | 76.0 |
| 3 | PlayStation 5 | 75.4 |
| Model | R² Score | MAE | MSE |
|---|---|---|---|
| Linear Regression | — | — | — |
| Ridge Regression | — | — | — |
| Random Forest | — | — | — |
| Gradient Boosting ✅ | 0.378 | 0.789 | — |
Best Model: Gradient Boosting with R² = 0.3780 and MAE = 0.7885
- Platform (encoded)
- User Review Score
- Release Year
- Release Month
- Summary Length
- Game
summarytext was not used (NLP could improve results) - Non-linear relationships may not be fully captured
- External factors like marketing and franchise popularity were not included
- Apply NLP / Sentiment Analysis on game summaries
- Use advanced models like XGBoost or Neural Networks
- Include additional features (genre, developer, publisher)
├── all_games.csv # Raw dataset
├── Project.ipynb # Main Jupyter Notebook
├── Report_VIDEO_GAME_RATING_ANALYSIS.pdf # Project report
└── README.md # This file
- Metacritic Video Game Dataset
- Scikit-learn Documentation
- Pandas Documentation
- Matplotlib Documentation
- Python Official Documentation