Scrape Spotify albums by keywords and collect clean, structured album metadata you can use in catalogs, dashboards, or research workflows. It captures essential Spotify album details like artists, cover art, release dates, and playability so you can build reliable music datasets fast.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for spotify-albums-scraper you've just found your team — Let’s Chat. 👆👆
This project searches Spotify albums by keyword and extracts a consistent set of album-level details into a structured dataset. It solves the problem of collecting album metadata at scale without manually browsing search pages, and it’s built for developers, analysts, and teams that need repeatable Spotify albums keyword scraping for data pipelines, content ops, and reporting.
- Searches album results for one or more keywords and iterates through result pages automatically
- Captures album entities with artist details, artwork, and release information in a normalized format
- Uses browser automation with request interception to collect data efficiently and consistently
- Includes randomized browser signals (user-agent + stealth) to reduce interruptions during long runs
- Streams results incrementally so partial runs still produce usable output
| Feature | Description |
|---|---|
| Keyword album search | Scrape album results for one or more keywords with a simple input list. |
| Structured album metadata | Collect album identifiers, names, artists, images, release dates, and more in a consistent schema. |
| Request interception extraction | Reads data directly from relevant responses for more stable parsing than DOM-only approaches. |
| Pagination via offsets | Continues fetching additional pages until the end of results or maxItems is reached. |
| Incremental dataset pushing | Pushes batches as they are found so you don’t lose progress on partial runs. |
| Stealth + randomized user-agent | Mimics typical browser traits to reduce blocking and improve session stability. |
| Tunable timeouts | Longer navigation and handler timeouts for slow networks and heavy pages. |
| Multi-keyword runs | Processes multiple keywords in a single run and tags each result with its source keyword. |
| Field Name | Field Description |
|---|---|
| uri | Spotify album URI identifier for the album entity. |
| name | Album title as shown in search results. |
| albumUrl | Direct URL to the album page built from the album URI. |
| artists | List of contributing artists (names, URIs, and related artist identifiers when available). |
| images | Cover art image set (commonly multiple sizes) for thumbnails and previews. |
| releaseDate | Album release date (when available) for timeline and freshness analysis. |
| releaseDatePrecision | Precision of the release date (day/month/year) when provided. |
| playability | Whether the album is playable in the current context/region/session. |
| label | Album label/publisher when included in the data payload. |
| totalTracks | Total number of tracks for the album (when available). |
| keyword | The originating keyword used to discover this album result. |
[
{
"uri": "spotify:album:1ATL5GLyefJaxhQzSPVrLX",
"name": "Fine Line",
"albumUrl": "https://open.spotify.com/albums/1ATL5GLyefJaxhQzSPVrLX",
"artists": [
{
"name": "Harry Styles",
"uri": "spotify:artist:6KImCVD70vtIoJWnq6nGn3"
}
],
"images": [
{
"url": "https://i.scdn.co/image/ab67616d0000b273....",
"width": 640,
"height": 640
},
{
"url": "https://i.scdn.co/image/ab67616d00001e02....",
"width": 300,
"height": 300
}
],
"releaseDate": "2019-12-13",
"releaseDatePrecision": "day",
"totalTracks": 12,
"playability": {
"playable": true
},
"keyword": "fine line"
}
]
Spotify Albums Scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Spotify Albums Scraper )/
├── src/
│ ├── main.js
│ ├── scraper/
│ │ ├── SpotifyAlbumsScraper.js
│ │ ├── interceptors.js
│ │ ├── processors.js
│ │ └── cookies.js
│ ├── utils/
│ │ ├── delays.js
│ │ ├── logger.js
│ │ └── validators.js
│ └── config/
│ ├── defaults.json
│ └── selectors.json
├── input/
│ ├── schema.json
│ └── example.input.json
├── test/
│ ├── fixtures/
│ │ └── searchAlbums.response.sample.json
│ └── unit/
│ └── processors.test.js
├── .env.example
├── .gitignore
├── package.json
├── package-lock.json
└── README.md
- Music marketers use it to collect album metadata for keyword themes, so they can plan campaigns around releases and catalog clusters.
- Data analysts use it to build searchable album datasets, so they can track release patterns and compare artist output over time.
- Playlist curators use it to discover albums by genre/keyword terms, so they can refresh collections with consistent metadata.
- Developers use it to feed album data into apps and dashboards, so they can power browsing, recommendations, and catalog pages.
- Researchers use it to compile structured music metadata, so they can run studies without manual data entry.
1) What inputs does this project expect?
Provide a keywords array (one or more search terms). Optionally set maxItems to cap the number of albums collected per run. Results are tagged with the originating keyword for easy grouping.
2) Why do results sometimes stop before reaching maxItems?
If the search reaches the end of available album results for a keyword (or no new items appear after multiple polling cycles), the run will conclude for that keyword. This prevents infinite loops when the search page has no more data.
3) Can I run multiple keywords in one job? Yes. The runner processes keywords sequentially and collects results per keyword. This is useful when building a multi-topic dataset in a single execution.
4) How do I reduce timeouts or improve stability on slower networks? Increase navigation and handler timeouts, and lower concurrency if you add it later. If the environment is constrained, ensure enough memory for headless Chromium and avoid running many browser processes at once.
Primary Metric: ~120–220 album items/minute on a typical server-grade connection when responses load consistently via intercepted search queries.
Reliability Metric: 92–97% successful keyword runs across mixed workloads (short + long keywords), with most failures attributable to temporary network stalls or search response changes.
Efficiency Metric: Steady memory footprint for long runs by pushing incremental batches and avoiding heavy DOM parsing for every item; typical headless usage remains stable when running one keyword at a time.
Quality Metric: 95%+ field completeness for core metadata (album name, URI, artists, images, keyword), with optional fields (label, totalTracks, playability details) varying by album and region context.
