================================ This project implements a recommendation engine for Steam games using collaborative filtering (ALS), content-based filtering (TF-IDF), and graph-based analysis with NetworkX. Features:
- Collaborative Filtering: Recommends games based on user gameplay history.
- Content-Based Filtering: Suggests similar games using game title metadata.
- Graph-Based Analysis: Uses NetworkX to visualize and analyze relationships between games and users. Setup:
- Prerequisites:
- Databricks environment with Spark enabled.
- Python libraries: PySpark, NumPy, Matplotlib, SciPy, NetworkX.
- Datasets:
- Upload the following CSV files to
/FileStore/tables/in your Databricks workspace: - games.csv
- users.csv
- recommendations_1.csv Steps to Run:
- Create a Databricks cluster and initialize a Spark session.
- Load the datasets into Spark DataFrames from the provided paths.
- Perform data preprocessing:
- Handle missing values.
- Tokenize game titles and compute TF-IDF.
- One-hot encode categorical features and scale numerical features.
- Train a collaborative filtering model using ALS and generate recommendations.
- Calculate cosine similarity for content-based filtering to find similar games.
- Build and visualize graphs:
- Create a game similarity graph based on cosine similarity.
- Create a user-game interaction graph based on collaborative filtering results.
- Display results:
- Top 5 game recommendations for each user.
- Most similar games based on title metadata.
- Visualizations and graph metrics (e.g., centrality, communities). Requirements:
See requirements.txt for a list of required libraries.