Skip to content

Commit e9c73f4

Browse files
jschlomanclaude
andcommitted
Initial commit
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
0 parents  commit e9c73f4

129 files changed

Lines changed: 16772 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.env.example

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# API Keys
2+
DISCOGS_TOKEN=your_discogs_user_token
3+
ANTHROPIC_API_KEY=your_key # if using Claude
4+
GOOGLE_API_KEY=your_key # if using Gemini
5+
6+
# Paths
7+
LIBRARY_ROOT=/path/to/music
8+
WORKING_DIR=./tagger_workdir
9+
10+
# Tuning
11+
TAGGER_WORKERS=4
12+
DISCOGS_FUZZY_THRESHOLD=85 # 0–100; minimum match confidence
13+
ID3_VERSION=2.3 # 2.3 or 2.4
14+
LLM_PROVIDER=claude # claude | gemini
15+
16+
# Behavior flags
17+
RETRY_NOT_FOUND=false

.githooks/pre-push

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#!/usr/bin/env bash
2+
# pre-push hook — mirrors the CI quality gate.
3+
# Runs automatically before every `git push`.
4+
#
5+
# One-time setup (per clone):
6+
# git config core.hooksPath .githooks
7+
#
8+
# To skip in an emergency:
9+
# git push --no-verify
10+
11+
set -euo pipefail
12+
13+
echo "==> pre-push: running CI checks …"
14+
15+
# Resolve a working Python interpreter.
16+
# Priority: active venv > py launcher (Windows) > python > python3.
17+
find_python() {
18+
if [[ -n "${VIRTUAL_ENV:-}" ]]; then
19+
# Use the venv Python directly
20+
if [[ -x "$VIRTUAL_ENV/Scripts/python.exe" ]]; then
21+
echo "$VIRTUAL_ENV/Scripts/python.exe"; return
22+
fi
23+
if [[ -x "$VIRTUAL_ENV/bin/python" ]]; then
24+
echo "$VIRTUAL_ENV/bin/python"; return
25+
fi
26+
fi
27+
for candidate in "py -3" python python3; do
28+
# shellcheck disable=SC2086
29+
if $candidate -c "import sys" 2>/dev/null; then
30+
echo $candidate; return
31+
fi
32+
done
33+
echo "ERROR: no working Python interpreter found." >&2
34+
exit 1
35+
}
36+
37+
PYTHON=$(find_python)
38+
39+
run() {
40+
echo ""
41+
echo "--- $*"
42+
# Split PYTHON in case it is "py -3" (two words)
43+
# shellcheck disable=SC2086
44+
$PYTHON -m "$@"
45+
}
46+
47+
run ruff check .
48+
run ruff format --check .
49+
run mypy tagger/
50+
run pytest tests/unit tests/integration -m "not slow" -q
51+
52+
echo ""
53+
echo "==> pre-push: all checks passed."

.github/workflows/ci.yml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main, feat/*, fix/*, chore/*, refactor/*]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
lint-pr-title:
11+
if: github.event_name == 'pull_request'
12+
runs-on: ubuntu-latest
13+
steps:
14+
- name: Check PR title follows conventional commits
15+
env:
16+
PR_TITLE: ${{ github.event.pull_request.title }}
17+
run: |
18+
echo "PR title: $PR_TITLE"
19+
if ! echo "$PR_TITLE" | grep -qP '^(feat|fix|perf|refactor|docs|style|test|chore|ci|build)(\(.+\))?(!)?: .+'; then
20+
echo "ERROR: PR title must follow Conventional Commits format."
21+
echo "Examples: 'feat: add plugin registry', 'fix(writer): handle permission error', 'feat!: breaking api change'"
22+
echo "See https://www.conventionalcommits.org"
23+
exit 1
24+
fi
25+
26+
quality:
27+
runs-on: ubuntu-latest
28+
steps:
29+
- uses: actions/checkout@v4
30+
31+
- uses: actions/setup-python@v5
32+
with:
33+
python-version: "3.11"
34+
cache: "pip"
35+
36+
- name: Install dependencies
37+
run: |
38+
python -m pip install --upgrade pip
39+
pip install -e ".[dev]"
40+
41+
- name: Run ruff check
42+
run: ruff check .
43+
44+
- name: Run ruff format check
45+
run: ruff format --check .
46+
47+
- name: Run mypy
48+
run: mypy tagger/
49+
50+
- name: Run tests with coverage
51+
run: pytest tests/unit tests/integration -m "not slow"
52+
env:
53+
CI: "1"

.github/workflows/release.yml

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name: Release
2+
3+
on:
4+
pull_request:
5+
branches: [main]
6+
types: [closed]
7+
8+
jobs:
9+
release:
10+
if: github.event.pull_request.merged == true
11+
runs-on: ubuntu-latest
12+
concurrency: release
13+
permissions:
14+
contents: write
15+
16+
steps:
17+
- uses: actions/checkout@v4
18+
with:
19+
fetch-depth: 0
20+
ref: main
21+
22+
- name: Generate timestamp tag
23+
id: version
24+
run: |
25+
BASE="v$(date -u +%Y.%m.%d)"
26+
TAG="$BASE"
27+
N=1
28+
while git ls-remote --tags origin "$TAG" | grep -q "$TAG"; do
29+
TAG="${BASE}.${N}"
30+
N=$((N + 1))
31+
done
32+
echo "tag=$TAG" >> $GITHUB_OUTPUT
33+
34+
- name: Create tag and GitHub release
35+
env:
36+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
37+
run: |
38+
TAG="${{ steps.version.outputs.tag }}"
39+
git tag "$TAG"
40+
git push origin "$TAG"
41+
gh release create "$TAG" --title "$TAG" --generate-notes

.gitignore

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
.venv/
2+
venv/
3+
.env
4+
5+
*.csv
6+
*.log
7+
8+
# Python
9+
__pycache__/
10+
*.py[cod]
11+
*$py.class
12+
*.so
13+
.Python
14+
build/
15+
develop-eggs/
16+
dist/
17+
downloads/
18+
eggs/
19+
.eggs/
20+
lib/
21+
lib64/
22+
parts/
23+
sdist/
24+
var/
25+
wheels/
26+
share/python-wheels/
27+
*.egg-info/
28+
.installed.cfg
29+
*.egg
30+
MANIFEST
31+
32+
# Pytest / Coverage
33+
.pytest_cache/
34+
.coverage
35+
htmlcov/
36+
.tox/
37+
.nosetests/
38+
nosetests.xml
39+
coverage.xml
40+
*.cover
41+
.mypy_cache/
42+
.dmypy.json
43+
dmypy.json
44+
.hypothesis/
45+
46+
# Vim
47+
*~
48+
*.un~
49+
.netrwhist
50+
51+
# OS
52+
.DS_Store
53+
Thumbs.db
54+
55+
# Project — intermediate output files and internal-only scripts
56+
tagger_workdir/
57+
scripts/
58+
*.db
59+
*.json
60+
*.txt
61+
*.ps1
62+
config.yaml
63+
recent_folders.py
64+
65+
# Claude Code
66+
.claude/

.pre-commit-config.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
repos:
2+
- repo: https://github.com/astral-sh/ruff-pre-commit
3+
rev: v0.4.4
4+
hooks:
5+
- id: ruff
6+
args: [--fix]
7+
- id: ruff-format
8+
9+
- repo: https://github.com/pre-commit/mirrors-mypy
10+
rev: v1.10.0
11+
hooks:
12+
- id: mypy
13+
args: [tagger/]
14+
additional_dependencies:
15+
- pydantic>=2.7
16+
- pydantic-settings>=2.2
17+
- types-beautifulsoup4

CHANGELOG.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [Unreleased]
9+
10+
## [0.1.0] - 2026-04-24
11+
12+
### Added
13+
14+
- Parallel enrichment pipeline: scans MP3 folders, fetches Discogs / Wikipedia / MusicBrainz
15+
metadata, and writes enriched ID3 tags via `mutagen`. Controlled by `--workers`.
16+
- Five-strategy track-matching algorithm that handles multi-disc albums, non-numeric Discogs
17+
positions (vinyl A/B sides, roman numerals), and fuzzy ID3 title matching.
18+
- ID3 tag writing with full metadata: title, artist, album artist, year, genre, grouping
19+
(pipe-delimited `Origin:City | Gender:... | Subgenre:... | Label:... | link:...`),
20+
disc number, and embedded album art.
21+
- Manual-review CSV workflow: albums that cannot be matched automatically are exported to a
22+
CSV; after a human supplies Discogs URLs the `process-manual` command applies them.
23+
- `scan-integrity` command: walks the library and reports ID3 tag / folder-name mismatches
24+
(album artist mismatch, inconsistent tags, all-untitled albums, etc.).
25+
- `retry-not-found` command: re-runs Discogs enrichment on all `not_found` albums without
26+
a subprocess round-trip.
27+
- `prefill-master-urls` command: pre-fills the `user_discogs_url` column in a manual-review
28+
CSV using two search passes, reducing reviewer lookup work.
29+
- `enrich-missing` command: re-scans artist directories for albums referenced in a reviewed
30+
CSV that are missing from the database.
31+
- Claude and Gemini LLM clients for collective / affiliation detection, written as injectable
32+
`Protocol` implementations so either backend can be swapped at runtime.
33+
- SQLite repository layer with versioned migrations and WAL-mode concurrency.
34+
- Structured logging via `structlog` (JSON in CI, pretty console in development).
35+
- Pydantic v2 models at every external data boundary (Discogs API, LLM JSON output, DB rows).
36+
- `tenacity`-based retry decorator with server-provided `Retry-After` awareness and
37+
exponential back-off.
38+
- Token-bucket rate limiter used across all Discogs API calls.
39+
- Windows SMB share fallback: when `mutagen`'s in-place save fails with `EINVAL`, the writer
40+
copies to a local temp file and moves it back to avoid cross-device shutil issues.
41+
- Four Claude Code skills (`/scan-integrity`, `/retry-not-found`, `/prefill-master-urls`,
42+
`/enrich-missing`) matching the four operational CLI commands.
43+
44+
### Fixed
45+
46+
- Apostrophe / contraction handling in title-case normalisation so that "She's Gone" is not
47+
upper-cased to "She'S Gone" (#79).
48+
- Multi-disc track-title shifting: disc-2 files now map directly to Discogs position "N-M"
49+
rather than falling through to an incorrect positional index.
50+
- Artist-name sanity check rejects Discogs matches where folder artist and release artist have
51+
< 40% token-set similarity (e.g. Cat Stevens vs. Astrud Gilberto class of false positives).
52+
- Discogs release artist used as track-artist fallback when per-track artist list is empty.
53+
54+
[Unreleased]: https://github.com/jschloman/mp3-enricher/compare/v0.1.0...HEAD
55+
[0.1.0]: https://github.com/jschloman/mp3-enricher/releases/tag/v0.1.0

0 commit comments

Comments
 (0)