Skip to content

saulsjam/lastfm-listening-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Last.fm Listening Analysis

Project Goal

Analyze long-run listening behavior using personal Last.fm history (~146k scrobbles). The project emphasizes a conventional, reproducible ETL pipeline and disciplined time-based aggregation rather than novelty or optimization.


Data Source

Listening history is sourced from the Last.fm API via the user.getRecentTracks endpoint. Each scrobble represents a completed listening event. Data are retrieved via paginated API requests and stored locally as raw JSON responses.

Raw API data are not committed to this repository; all downstream analysis is performed on derived tabular datasets.


Known API Characteristics

  • The Last.fm API can be flaky and intermittently returns server errors (500/502/503/504).
  • The endpoint represents “recent” listening activity, so total scrobble counts will change between runs as listening continues.
  • No cleaning or transformation is performed during data ingestion to preserve source fidelity.

Retry logic and staged processing are used to mitigate these issues.


Analysis Pipeline

This repository is organized as a linear, reproducible pipeline:

  1. 01_fetch_lastfm.py
    Fetch paginated listening history from the Last.fm API and write one raw JSON file per page to disk. No transformation or filtering is performed at this stage.

  2. 02_flatten_lastfm.py
    Read all raw JSON page files, extract one row per scrobble, and write a single flattened interim dataset as CSV. Lineage fields (source_page, source_file) are included for traceability.

  3. 03_validate_interim.py
    Validate structural assumptions about the interim dataset (schema, required fields, timestamp parseability, duplicate detection). This script enforces pipeline contracts before any transformation is applied.

  4. 04_make_processed.py
    Read the validated interim dataset, derive explicit UTC timestamp and time-part fields (date, year, month, day-of-week, hour), and write a processed CSV for downstream analysis. No filtering, deduplication, or analytical aggregation is performed.

Running these scripts in order reproduces the interim dataset used for analysis.


Interim Output

  • Flattened scrobble dataset
    data/interim/lastfm_scrobbles_interim.csv

Schema:

  • played_at_utc
  • track_name
  • artist_name
  • album_name
  • track_mbid
  • artist_mbid
  • album_mbid
  • source_page
  • source_file

Processed Output

  • Time-enriched scrobble dataset
    data/processed/lastfm_scrobbles_processed.csv

Derived fields include:

  • UTC timestamp (played_at_ts_utc)
  • Calendar date (date_utc)
  • Year, month, day-of-week, and hour (UTC)

The processed dataset preserves all interim fields and adds only derived time features. No records are filtered or modified.


Environment & Reproducibility

  • Python 3.11 (conda / Anaconda)
  • Key packages:
    • requests
    • python-dotenv
  • Secrets are managed via a local .env file (not committed)
  • .env.example documents required environment variables

To reproduce:

  1. Clone the repository
  2. Create a .env file with: LASTFM_API_KEY=... LASTFM_USERNAME=...
  3. Run: python src/01_fetch_lastfm.py python src/02_flatten_lastfm.py python src/03_validate_interim.py python src/04_make_processed.py

Visualization Layer

A Tableau dashboard built on the processed dataset analyzes longitudinal listening behavior across three dimensions:

  • Volume — total scrobbles per year
  • Breadth — distinct artists per year
  • Intensity — average scrobbles per artist per year

Key structural findings:

  • Listening volume increased materially after 2018.
  • Distinct artist breadth expanded sharply beginning around 2013.
  • Average scrobbles per artist declined as breadth increased.

Tableau dashboard showing scrobbles per year and artist breadth vs intensity

Tableau Public dashboard: Long-run listening behavior

About

Python ETL pipeline + Tableau dashboards using my Last.fm scrobble history.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages