Skip to content

itincknell/musicbrainz-query-server

Repository files navigation

MusicBrainz Query Server

A Spark SQL–based query engine for analyzing MusicBrainz data, served via gRPC. Designed to support concurrent top-ten artist queries to a local PostgreSQL mirror of the MusicBrainz database (see the MusicBrainz Docker mirror project).

Features

  • Spark SQL engine for large-scale querying of MusicBrainz data
  • gRPC API for remote, language-agnostic access to query results
  • Docker and compose setup for local development

Repository Layout

.
├── client/                       # gRPC client code and examples
├── proto/                        # Protocol buffers definitions
├── query_engine/                 # Spark SQL application logic
├── server/                       # gRPC server implementation
├── jars/                         # Compiled dependencies and artifacts
├── Dockerfile                    # Container image definition
├── docker-compose.yml            # Development compose configuration
├── requirements.txt              # Python service dependencies
└── setup_musicbrainz_lite.sh     # Optional local MusicBrainz mirror setup

Quick Start

Prerequisites

  • Python 3
  • Java (JDK 8/11/17 depending on your Spark/PySpark version)
  • PySpark (installed via pip/requirements.txt)
  • PostgreSQL with a MusicBrainz mirror (local container or local install)
  • PostgreSQL JDBC driver jar

Setup

  1. Create a local MusicBrainz PostgreSQL mirror (use setup_musicbrainz_lite.sh or external tools).
  2. Build Spark query engine artifacts and gRPC server code.
  3. Configure database connection in server settings.

Run with Docker Compose

docker compose up --build

Query via gRPC

Use the provided client stubs (in client/) for sending queries to the server. See proto/ for RPC definitions.

Development Notes

Query logic lives in query_engine/

Protocol definitions in proto/ drive both server and client interfaces

Docker ensures consistent environment for local testing

About

A Spark SQL-based query engine for analyzing MusicBrainz data, served via gRPC. Serves lists of top artists within a genre and year range from a local PostgreSQL mirror.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors