Skip to content

Lightweight optimizations, latent bug fixes, and circuit-breaker wiring#50

Open
betmoar wants to merge 8 commits into
mainfrom
feat/code-review-optimization-LQhu2
Open

Lightweight optimizations, latent bug fixes, and circuit-breaker wiring#50
betmoar wants to merge 8 commits into
mainfrom
feat/code-review-optimization-LQhu2

Conversation

@betmoar

@betmoar betmoar commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Summary

A deep-dive code review (core orchestration, providers, rate limiter, cache, downloaders, exporters) followed by a focused, low-risk refactor. Changes split into three themes across three commits.

376 tests pass (8 new), ruff check + ruff format clean, no new vulture findings.

perf: lightweight cleanups (behavior-preserving)

  • O(n) track dedupTrackMatcher.get_unique_tracks replaced an O(n²) linear-scan + list.remove + double-sort with a single dict pass + one sort. Tie-breaking (earliest-wins among equal confidence, highest-confidence selection) preserved.
  • Cache size reuseSizeStrategy.update_metadata no longer re-serializes the value via json.dumps on every access; it reuses the size computed at set-time.
  • Shazam cooldown hoisted out of the per-segment hot path into __init__.
  • MIN_SEGMENT_FILE_SIZE constant instead of the literal 1000; precompiled exporter sanitization regexes; removed unused get_ydl_opts(); parallelized Spotify get_track_details with asyncio.gather.

fix: two latent crash-on-use bugs

  • ACRCloud factoryACRCloudProvider() was constructed with no args, raising TypeError the moment acrcloud was selected. Now reads TRACKLISTIFY_ACR_ACCESS_KEY / _ACCESS_SECRET (and optional _HOST) from the environment (the documented design) and raises a clear ProviderError when missing.
  • Spotify downloader_set_metadata called asyncio.run() while already inside the running download() event loop, raising RuntimeError whenever a track had cover art. Made _set_metadata async and awaits the cover fetch.

feat: circuit-breaker wiring + bounded metrics

  • The circuit breaker was implemented but never called in production (only from tests), so circuit_state never opened. The identification loop now reports each provider request outcome (a None/no-match result still counts as success; an exception is a failure), so repeated provider failures trip the breaker and stop hammering it. Remains config-gated via circuit_breaker_enabled (default threshold 5, 60s reset).
  • Bounded rate_limit_windows to MAX_RATE_LIMIT_WINDOWS so long-running processes don't grow the metrics list without bound.

Tests

  • New tests/test_provider_factory.py — ACRCloud credential wiring (missing/partial creds raise, env creds construct, host override, caching, unknown provider).
  • New tests/test_identification_circuit_breaker.py — breaker trips after repeated failures (and stops calling the provider); record_result resets the failure streak on success.

Notes / out of scope

  • Offloading the synchronous mutagen probe to a thread was considered but reverted: split_audio is synchronous and making it async would ripple into callers/tests — too much regression surface for an optional item.
  • Parallelizing the main per-segment identification loop was deliberately excluded (interacts with rate limiter, Shazam cooldown, ordering, and circuit-breaker state).
  • Pre-existing ruff format drift in unrelated files (cli.py, test_error_handling.py, test_logger.py) was left untouched.

claude added 8 commits May 29, 2026 17:22
- track: O(n) get_unique_tracks via dict instead of O(n^2) scan + list.remove
  (preserves earliest-wins tie-break and highest-confidence selection)
- cache: reuse the size computed at set-time in SizeStrategy.update_metadata
  instead of re-serializing the value on every access
- core/base: use MIN_SEGMENT_FILE_SIZE constant instead of literal 1000
- shazam: resolve the inter-request cooldown once in __init__ rather than
  re-reading config on every segment
- exporters: precompile filename-sanitization regexes at module level
- ytdlp: remove unused get_ydl_opts() (opts are built inline in download)
- spotify provider: fetch track + audio-features concurrently in get_track_details
- factory: ACRCloudProvider() was constructed with no args (TypeError on use).
  Read TRACKLISTIFY_ACR_ACCESS_KEY / _ACCESS_SECRET (and optional _HOST) from
  the environment and raise a clear ProviderError when they are missing.
- downloaders/spotify: _set_metadata called asyncio.run() while already inside
  the running download() event loop (RuntimeError whenever cover art exists).
  Make _set_metadata async, await the cover fetch, and await the call site.
- add tests covering the ACRCloud credential wiring.
…trics

- rate_limiter: add public record_result() and bound rate_limit_windows to
  MAX_RATE_LIMIT_WINDOWS so long runs don't grow the metrics list unbounded
- identification: report each provider request outcome to the rate limiter
  (a None/no-match result still counts as success; an exception is a failure),
  so repeated provider failures trip the circuit breaker and stop hammering it
- add integration test for the trip behavior and failure-streak reset

The circuit breaker was previously implemented but never wired into the
request path, so circuit_state never opened in production. It remains
config-gated via circuit_breaker_enabled (default threshold 5, 60s reset).
- core/base: extract _build_identification_manager() helper, removing the
  duplicated IdentificationManager construction in __init__ and process_input
- providers/acrcloud: collapse the redundant nested try/except around response
  parsing into a single JSONDecodeError handler (the outer handler re-wrapped
  the inner ProviderError); behavior preserved
- add ACRCloud identify_track tests (success, no-result code, 401, 429,
  invalid JSON) for the now-reachable provider
…tocol

The identification loop and base protocol call identify_track(audio_segment),
but ACRCloud expected raw bytes — so it could never run through the main loop.
Normalize the input: read bytes from an AudioSegment's file_path (using its
start_time), while still accepting raw bytes for direct/back-compat use.

Add a test covering the segment path.
…ercion

- initialize original_title/uploader/duration/_output_formats in __init__ so
  they always exist, removing the defensive getattr fallbacks at their reads
- replace the two duplicated duration try/except blocks with a single
  _coerce_float() helper
- preserve the _output_formats -> config.output_format fallback via `or`

Behavior-preserving: both process_input branches already assigned these
attributes before use, so the getattr defaults never triggered in practice.
Document the changes delivered in this PR (performance, correctness fixes,
circuit-breaker wiring, refactors, dependency maintenance) and a prioritized,
risk-classified backlog of follow-up improvements with file references,
rationale, and test strategy.
Copilot AI review requested due to automatic review settings June 4, 2026 00:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR applies a set of targeted, low-risk optimizations and latent bug fixes across the identification pipeline (providers, factory wiring, rate limiting/circuit breaker integration), along with test coverage and documentation to lock in the intended behaviors.

Changes:

  • Fixes two crash-on-use paths (ACRCloud provider construction via factory/env creds; Spotify downloader cover-art fetch within an existing event loop).
  • Wires circuit-breaker outcome reporting into the identification loop and bounds rate-limiter metrics growth.
  • Includes performance-minded refactors (O(n) track dedup, cached size reuse, hoisted Shazam cooldown, precompiled exporter regexes, concurrent Spotify detail fetch) plus new tests and an improvement-plan doc.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_providers_acrcloud.py Adds response-parsing/error-mapping tests for ACRCloudProvider (no network).
tests/test_provider_factory.py Verifies ACRCloud credential env wiring, host override, and factory caching/error cases.
tests/test_identification_circuit_breaker.py Ensures the identification loop reports request outcomes so the breaker can trip/reset.
src/tracklistify/utils/rate_limiter.py Adds bounded rate_limit_windows recording and a public record_result() API for breaker updates.
src/tracklistify/utils/identification.py Records provider request outcomes (success vs exception) for circuit breaker behavior.
src/tracklistify/utils/constants.py Introduces MAX_RATE_LIMIT_WINDOWS constant.
src/tracklistify/providers/spotify.py Parallelizes independent track/audio-feature lookups with asyncio.gather.
src/tracklistify/providers/shazam.py Hoists cooldown resolution into __init__ to avoid per-segment config parsing.
src/tracklistify/providers/factory.py Fixes ACRCloud provider construction by reading required env vars and raising a clear ProviderError.
src/tracklistify/providers/acrcloud.py Aligns identify_track with the AudioSegment protocol (still accepts raw bytes) and simplifies JSON error handling.
src/tracklistify/exporters/tracklist.py Precompiles filename-sanitization regexes to avoid per-call compilation overhead.
src/tracklistify/downloaders/ytdlp.py Removes unused yt-dlp options helper and now-unused import.
src/tracklistify/downloaders/spotify.py Makes _set_metadata async and awaits cover fetch to avoid asyncio.run() inside a running loop.
src/tracklistify/core/track.py Replaces O(n²) track dedup with an O(n) dict-based approach while preserving tie-break behavior.
src/tracklistify/core/base.py Refactors app orchestration: centralizes manager construction, removes getattr fallbacks, adds float coercion helper, uses segment-size constant.
src/tracklistify/cache/invalidation.py Reuses cached entry size when present to avoid repeated JSON serialization work.
docs/IMPROVEMENT_PLAN.md Adds a technical improvement plan documenting shipped changes and a prioritized follow-up backlog.
docs/CHANGELOG.md Documents the addition of the improvement plan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants