Last.fm Listening Analysis

Project Goal

Analyze long-run listening behavior using personal Last.fm history (~146k scrobbles). The project emphasizes a conventional, reproducible ETL pipeline and disciplined time-based aggregation rather than novelty or optimization.

Data Source

Listening history is sourced from the Last.fm API via the user.getRecentTracks endpoint. Each scrobble represents a completed listening event. Data are retrieved via paginated API requests and stored locally as raw JSON responses.

Raw API data are not committed to this repository; all downstream analysis is performed on derived tabular datasets.

Known API Characteristics

The Last.fm API can be flaky and intermittently returns server errors (500/502/503/504).
The endpoint represents “recent” listening activity, so total scrobble counts will change between runs as listening continues.
No cleaning or transformation is performed during data ingestion to preserve source fidelity.

Retry logic and staged processing are used to mitigate these issues.

Analysis Pipeline

This repository is organized as a linear, reproducible pipeline:

01_fetch_lastfm.py
Fetch paginated listening history from the Last.fm API and write one raw JSON file per page to disk. No transformation or filtering is performed at this stage.
02_flatten_lastfm.py
Read all raw JSON page files, extract one row per scrobble, and write a single flattened interim dataset as CSV. Lineage fields (source_page, source_file) are included for traceability.
03_validate_interim.py
Validate structural assumptions about the interim dataset (schema, required fields, timestamp parseability, duplicate detection). This script enforces pipeline contracts before any transformation is applied.
04_make_processed.py
Read the validated interim dataset, derive explicit UTC timestamp and time-part fields (date, year, month, day-of-week, hour), and write a processed CSV for downstream analysis. No filtering, deduplication, or analytical aggregation is performed.

Running these scripts in order reproduces the interim dataset used for analysis.

Interim Output

Flattened scrobble dataset
data/interim/lastfm_scrobbles_interim.csv

Schema:

played_at_utc
track_name
artist_name
album_name
track_mbid
artist_mbid
album_mbid
source_page
source_file

Processed Output

Time-enriched scrobble dataset
data/processed/lastfm_scrobbles_processed.csv

Derived fields include:

UTC timestamp (played_at_ts_utc)
Calendar date (date_utc)
Year, month, day-of-week, and hour (UTC)

The processed dataset preserves all interim fields and adds only derived time features. No records are filtered or modified.

Environment & Reproducibility

Python 3.11 (conda / Anaconda)
Key packages:
- requests
- python-dotenv
Secrets are managed via a local .env file (not committed)
.env.example documents required environment variables

To reproduce:

Clone the repository
Create a .env file with: LASTFM_API_KEY=... LASTFM_USERNAME=...
Run: python src/01_fetch_lastfm.py python src/02_flatten_lastfm.py python src/03_validate_interim.py python src/04_make_processed.py

Visualization Layer

A Tableau dashboard built on the processed dataset analyzes longitudinal listening behavior across three dimensions:

Volume — total scrobbles per year
Breadth — distinct artists per year
Intensity — average scrobbles per artist per year

Key structural findings:

Listening volume increased materially after 2018.
Distinct artist breadth expanded sharply beginning around 2013.
Average scrobbles per artist declined as breadth increased.

Tableau Public dashboard: Long-run listening behavior

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs		docs
src		src
tableau		tableau
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Last.fm Listening Analysis

Project Goal

Data Source

Known API Characteristics

Analysis Pipeline

Interim Output

Processed Output

Environment & Reproducibility

Visualization Layer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Last.fm Listening Analysis

Project Goal

Data Source

Known API Characteristics

Analysis Pipeline

Interim Output

Processed Output

Environment & Reproducibility

Visualization Layer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages