Skip to content

surakifalenye/bandcamp-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Bandcamp Crawler

Bandcamp Crawler lets you explore, analyze, and export rich metadata from Bandcamp pages, including artists, albums, tracks, and search results. It turns the public catalog into structured data you can pipe into dashboards, research workflows, or music-discovery tools. Designed for music analysts, indie label teams, and developers building features around Bandcamp content.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for bandcamp-crawler you've just found your team — Let’s Chat. 👆👆

Introduction

Bandcamp Crawler is a command-line and scriptable tool that navigates Bandcamp pages and converts them into structured JSON records. It supports search pages, artist discographies, albums, and individual tracks, making it easy to analyze discographies, track performance, and catalog metadata at scale.

It is ideal for:

  • Music data scientists and analysts who want high-quality catalog data.
  • Indie labels and artist managers tracking releases and tags across catalogs.
  • Developers integrating Bandcamp metadata into apps, dashboards, or recommendation engines.

Bandcamp Catalog Intelligence

  • Supports multiple entry points including search, artist music pages, album pages, track pages, and discovery feeds.
  • Traverses pagination on search and discovery views to capture more results with configurable limits.
  • Extracts detailed album metadata including tags, tracklists, artwork, and artist information.
  • Captures track-level attributes such as duration, position, album links, and artist details.
  • Provides flexible input flags to control whether albums, tracks, or artists are followed and stored as individual records.

Features

Feature Description
Multi-entry crawling Start from search, artist, album, track, or discover pages and let the crawler resolve all supported entities.
Discography extraction Collect complete artist discographies including albums, tags, and associated metadata.
Track-level insights Extract track titles, positions, durations, album references, and artist information.
Configurable depth Use boolean flags to decide whether to follow albums from search, tracks from albums, or albums from tracks.
Pagination control Limit how many search or discover pages are traversed with a simple numeric setting.
Proxy-ready networking Plug in your own proxy configuration for safer, more reliable large-scale runs.
Verbose debugging mode Enable debug logging to inspect crawling flow, parsed entities, and edge cases.
Export-friendly output Save results as structured JSON that can be converted to CSV, Excel, or imported into your own database or analytics stack.

What Data This Scraper Extracts

Field Name Field Description
dataType Type of record scraped (e.g., search, album, track, artist).
title Human-readable title of the entity (album title, track title, artist name, etc.).
url Canonical URL of the scraped entity on Bandcamp.
image.url URL of the primary artwork or thumbnail image associated with the entity.
pagination.page Current page number in a search or discovery result set.
pagination.pages Total number of available pages for the search query.
pagination.urls.first URL of the first page in the search result set.
pagination.urls.last URL of the last page in the search result set.
pagination.urls.next URL of the next page, if another page exists.
results[] Array of search results (artists, albums, tracks), each with its own dataType, title, url, and image.
artist.name Name of the album or track’s artist.
artist.url Canonical URL of the artist profile.
tags[] List of tags describing genres, locations, or themes for an album.
tags[].title Display label of the tag (e.g., metal, rock, Los Angeles).
tags[].url URL link to the corresponding tag page on Bandcamp.
tracklist[] Collection of tracks belonging to an album.
tracklist[].title Title of the track in the album.
tracklist[].url URL to the track’s page.
tracklist[].position Numeric position of the track within the album.
tracklist[].duration Duration of the track in mm:ss format.
album.title Title of the album when scraping a track entity.
album.url URL of the album containing the track.
duration Duration of the track (when scraping a track entity).
position Track number within the album (when scraping a track entity).
images[] List of image variants associated with an album (e.g., different sizes).
images[].url URL of a specific album cover variant.

Example Output

Example:

[
  {
    "dataType": "search",
    "pagination": {
      "page": 1,
      "pages": 4,
      "urls": {
        "first": "https://bandcamp.com/search?q=five+finger+death+punch&page=1",
        "last": "https://bandcamp.com/search?page=4&q=five%20finger%20death%20punch",
        "next": "https://bandcamp.com/search?page=2&q=five%20finger%20death%20punch"
      }
    },
    "results": [
      {
        "dataType": "artist",
        "title": "Five Finger Death Punch",
        "url": "https://fivefingerdeathpunch.bandcamp.com?from=search&search_item_id=3335222211",
        "image": {
          "url": "https://f4.bcbits.com/img/0027318719_23.jpg"
        }
      },
      {
        "dataType": "album",
        "title": "AfterLife",
        "url": "https://fivefingerdeathpunch.bandcamp.com/album/afterlife?from=search&search_item_id=730568840",
        "image": {
          "url": "https://f4.bcbits.com/img/a3711245885_7.jpg"
        }
      }
    ]
  },
  {
    "dataType": "album",
    "title": "N.A.T.I.O.N.",
    "url": "https://badwolves.bandcamp.com/album/n-a-t-i-o-n",
    "artist": {
      "name": "Bad Wolves",
      "url": "https://badwolves.bandcamp.com"
    },
    "tags": [
      { "title": "metal", "url": "https://bandcamp.com/tag/metal?from=tralbum&artist=924521020" },
      { "title": "rock", "url": "https://bandcamp.com/tag/rock?from=tralbum&artist=924521020" },
      { "title": "Los Angeles", "url": "https://bandcamp.com/tag/los-angeles?from=tralbum&artist=924521020" }
    ],
    "tracklist": [
      {
        "title": "I'll Be There",
        "url": "https://badwolves.bandcamp.com/track/ill-be-there-1",
        "position": 1,
        "duration": "04:02"
      },
      {
        "title": "No Messiah",
        "url": "https://badwolves.bandcamp.com/track/no-messiah",
        "position": 2,
        "duration": "04:20"
      }
    ],
    "images": [
      { "url": "https://f4.bcbits.com/img/a0888598634_16.jpg" },
      { "url": "https://f4.bcbits.com/img/a0888598634_10.jpg" }
    ]
  },
  {
    "dataType": "track",
    "title": "In The Dark",
    "url": "https://inflamesofficial.bandcamp.com/track/in-the-dark",
    "album": {
      "title": "Foregone",
      "url": "https://inflamesofficial.bandcamp.com/album/foregone"
    },
    "artist": {
      "name": "In Flames",
      "url": "https://inflamesofficial.bandcamp.com"
    },
    "duration": "04:17",
    "position": 9
  }
]

Directory Structure Tree

Bandcamp Crawler/
├── src/
│   ├── index.js
│   ├── cli.js
│   ├── config/
│   │   ├── defaults.js
│   │   └── schema.json
│   ├── crawlers/
│   │   ├── searchCrawler.js
│   │   ├── artistCrawler.js
│   │   ├── albumCrawler.js
│   │   └── trackCrawler.js
│   ├── parsers/
│   │   ├── searchParser.js
│   │   ├── albumParser.js
│   │   └── trackParser.js
│   ├── services/
│   │   ├── httpClient.js
│   │   ├── proxyManager.js
│   │   └── logger.js
│   └── utils/
│       ├── htmlHelpers.js
│       ├── urlNormalizer.js
│       └── pagination.js
├── config/
│   ├── example.input.json
│   └── proxy.example.json
├── data/
│   ├── samples/
│   │   ├── search-sample.json
│   │   ├── album-sample.json
│   │   └── track-sample.json
│   └── exports/
│       └── README.md
├── tests/
│   ├── searchCrawler.test.js
│   ├── albumParser.test.js
│   └── trackParser.test.js
├── package.json
├── package-lock.json
├── README.md
└── LICENSE

Use Cases

  • Music data analysts use it to collect large-scale album, track, and artist metadata, so they can build dashboards and run catalog analytics without manual data entry.
  • Indie labels and managers use it to monitor their artists’ discographies and tags, so they can track genre positioning, discoverability, and catalog completeness.
  • Playlist and recommendation app developers use it to ingest structured Bandcamp metadata, so they can power search, filter, and recommendation features in their apps.
  • Market researchers use it to study genre trends, location tags, and release patterns, so they can identify emerging scenes and niches in the Bandcamp ecosystem.
  • Archivists and collectors use it to build personal or institutional catalogs of albums and tracks, so they can maintain curated offline or mirrored datasets for long-term reference.

FAQs

Q: What kinds of URLs can I start from? A: You can start from search pages, artist music pages, album pages, track pages, and discovery pages. The crawler automatically detects what type of page it is and structures the output accordingly.

Q: How do I limit how deep the crawler goes? A: Use the configuration flags to set maxPagesToSearch and boolean options such as fetching albums from search results or tracks from album pages. This lets you control both pagination and relationship-following behavior.

Q: Can I customize networking and proxy settings? A: Yes. The crawler accepts a proxy configuration object where you can enable or disable proxy usage and plug in your own proxy endpoints, giving you flexibility in how requests are routed.

Q: In what formats can I export the data? A: Data is produced as structured JSON records which you can easily transform into CSV, Excel, or load into your own databases, warehouses, or BI tools using standard conversion utilities or custom scripts.


Performance Benchmarks and Results

Primary Metric: In typical usage, the crawler processes around 80–120 pages per minute when fetching search and discovery results with moderate pagination, while still collecting associated album and track metadata.

Reliability Metric: With sensible rate limits and optional proxy usage, it commonly achieves a 95%+ successful request rate across long-running sessions spanning hundreds of pages.

Efficiency Metric: On a mid-range machine, a full run that includes search, album, and track traversal remains memory-efficient, routinely handling thousands of entities without exceeding a few hundred megabytes of RAM.

Quality Metric: Field completeness for core attributes (title, URL, artist, basic tags, track positions, and durations) typically exceeds 98%, ensuring the resulting dataset is robust enough for analytics, cataloging, and integration into downstream systems.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors