This project implements a Spotify song recommendation system that uses user listening history and preferences to suggest new music. It leverages Spotify API data, FAISS for similarity search, and an epsilon-greedy bandit algorithm for personalizing recommendations based on user feedback.
- final_notebook.ipynb: The main Jupyter notebook to run the recommendation system.
- LICENSE: The MIT license for this project.
- requirements.txt: A list of Python dependencies required to run the project.
- source_weights.json: Stores the weights for different recommendation sources (e.g., top tracks, recent tracks). These weights are updated based on user feedback.
- Spotify_Recommender_Main.py: A Python module containing the core logic for Spotify authentication, data fetching, data processing, recommendation generation, and feedback collection.
- Clone the repository (if applicable) or download the files.
- Install dependencies:
Open your terminal and navigate to the project directory. Run the following command to install the required Python packages:
pip install -r requirements.txt
- Spotify API Credentials:
- You need to create a Spotify Developer application to get a
Client IDandClient Secret. - Go to the Spotify Developer Dashboard and create an app.
- Once created, note down your
Client IDandClient Secret. - In your app settings on the Spotify Developer Dashboard, set the Redirect URI to
http://127.0.0.1:8888/callback. - Create a file named
spotify_config.jsonin the root of the project directory with the following content, replacingYOUR_CLIENT_IDandYOUR_CLIENT_SECRETwith your actual credentials:{ "client_id": "YOUR_CLIENT_ID", "client_secret": "YOUR_CLIENT_SECRET", "redirect_uri": "http://127.0.0.1:8888/callback", "scope": "user-read-recently-played user-top-read user-library-read" }
- You need to create a Spotify Developer application to get a
- Dataset:
- The project uses a dataset of Spotify songs with attributes and lyrics. The notebook expects a file named
songs_with_attributes_and_lyrics.csv. - You can download a suitable dataset from Kaggle, for example: 960K Spotify Songs With Lyrics data.
- Place the
songs_with_attributes_and_lyrics.csvfile in the root of the project directory.
- The project uses a dataset of Spotify songs with attributes and lyrics. The notebook expects a file named
- Open the Jupyter Notebook:
Launch Jupyter Lab or Jupyter Notebook and open the final_notebook.ipynb file.
or
jupyter lab
jupyter notebook
- Run the Cells:
Execute the cells in the notebook sequentially.
- The initial cells will import necessary libraries and process the song dataset.
- You will then be prompted to authenticate with Spotify. A web browser window will open asking you to log in and authorize the application.
- After successful authentication, the notebook will fetch your Spotify data (top tracks, recent tracks, etc.).
- Recommendations will be generated based on your data and the initial
source_weights.json. - You will be prompted in the notebook's output to rate the recommended songs.
- Based on your feedback, the
source_weights.jsonfile will be updated.
SpotifyAuthServer: Handles the OAuth2 authentication flow with Spotify.SpotifyUserData: Interacts with the Spotify API to fetch user-specific data like top tracks, recently played songs, saved tracks, and artist information.SpotifyUserDataFetcher: UsesSpotifyUserDatato fetch and process various types of user data, and ranks tracks based on source weights and time decay.HelperFunctions: Contains utility functions for tasks like token validation, token refreshing, data conversion, and calculating exponential decay for time-sensitive data.SpotifyDataEnricher: Enriches song data in a DataFrame by fetching additional details like genres, popularity, and explicit content from the Spotify API.EpsilonGreedyBandit: Implements an epsilon-greedy multi-armed bandit algorithm to dynamically update the weights of different recommendation sources based on user feedback.RecommendationFeedbackCollector: Manages the process of presenting recommendations to the user and collecting their ratings.
- Data Preparation: A large dataset of songs with their audio features is loaded and preprocessed. A FAISS index is built on these features for efficient similarity search.
- Spotify Authentication: The user authenticates with their Spotify account through an OAuth2 flow managed by a local Flask server.
- User Data Fetching: The system fetches the user's top tracks, recently played songs, saved tracks, and top artists from the Spotify API.
- User Profile Generation:
- Tracks from different sources (top, recent, saved, top artists' songs) are scored based on predefined weights in
source_weights.jsonand time decay for recent/saved tracks. - A weighted average feature vector (user profile) is created for each source based on the features of the highly-ranked songs from that source.
- Tracks from different sources (top, recent, saved, top artists' songs) are scored based on predefined weights in
- Recommendation Generation:
- For each source-specific user profile vector, the FAISS index is queried to find the most similar songs from the main dataset. These are the initial recommendations for that source.
- Feedback Collection:
- The system presents recommendations to the user (e.g., 10 songs per source).
- The user rates these songs (e.g., on a scale of 0-10).
- Weight Update (Reinforcement Learning):
- The
EpsilonGreedyBanditalgorithm uses the user's feedback to update thesource_weights.json. Sources that provide songs the user rates highly will have their weights increased, while sources providing less liked songs will have their weights decreased. - This allows the system to learn which sources are more relevant for the user over time.
- The
- Saving Updated Weights: The new source weights are saved back to
source_weights.jsonfor future sessions.
This project is licensed under the MIT License - see the LICENSE file for details.