This is a personal Spotify listening history tracking pipeline. It connects to the Spotify Web API to fetch recently played tracks (can call upto 50 recent tracks), stores them locally in a structured CSV format, and enhances them with additional metadata such as track duration and artist genres. The system is built to run incrementally, updating only new records and maintaining JSON-based caches for artist and song metadata.
You can also manually update missing or ambiguous metadata—especially useful for instrumental or lesser-known tracks where Spotify may not provide genre information. The search feature seems to yield no results for a surprisingly large number of artists.
- You will need to access and create a new app in Spotify developer dashboard and generate your own client id and secret for your Spotify listening account (This is free).
- Add them to a
.envfile in your project similar to.env_example.
extract_script.py- The main script that calls the Spotify Web API and generates the main CSV file and stores in./data/spotify_data_<current_year>.csv. The results of all runs (automated or otherwise) are recorded inauto_extract_log.txtwith appropriate timestamp information.extract_with_metadata.py- This script searches for an organizes the genre and track duration and adds them as two columns in addition to the existing DataFrame generated byextract_script. Stored as./data/spotify_data_with_metadata_<current_year>.csv. Also generates metadata files./metadata/artist_metadata.json,./metadata/song_metadata.jsonand./metadata/missing_queries.jsonspotify_logger.bat- The batch file used with the Windows Task Scheduler to run theextract_script.pyevery hour to check for recent history. This ensures tracks listened to are recorded well within the 50 track limit.