Lightweight optimizations, latent bug fixes, and circuit-breaker wiring#50
Open
betmoar wants to merge 8 commits into
Open
Lightweight optimizations, latent bug fixes, and circuit-breaker wiring#50betmoar wants to merge 8 commits into
betmoar wants to merge 8 commits into
Conversation
- track: O(n) get_unique_tracks via dict instead of O(n^2) scan + list.remove (preserves earliest-wins tie-break and highest-confidence selection) - cache: reuse the size computed at set-time in SizeStrategy.update_metadata instead of re-serializing the value on every access - core/base: use MIN_SEGMENT_FILE_SIZE constant instead of literal 1000 - shazam: resolve the inter-request cooldown once in __init__ rather than re-reading config on every segment - exporters: precompile filename-sanitization regexes at module level - ytdlp: remove unused get_ydl_opts() (opts are built inline in download) - spotify provider: fetch track + audio-features concurrently in get_track_details
- factory: ACRCloudProvider() was constructed with no args (TypeError on use). Read TRACKLISTIFY_ACR_ACCESS_KEY / _ACCESS_SECRET (and optional _HOST) from the environment and raise a clear ProviderError when they are missing. - downloaders/spotify: _set_metadata called asyncio.run() while already inside the running download() event loop (RuntimeError whenever cover art exists). Make _set_metadata async, await the cover fetch, and await the call site. - add tests covering the ACRCloud credential wiring.
…trics - rate_limiter: add public record_result() and bound rate_limit_windows to MAX_RATE_LIMIT_WINDOWS so long runs don't grow the metrics list unbounded - identification: report each provider request outcome to the rate limiter (a None/no-match result still counts as success; an exception is a failure), so repeated provider failures trip the circuit breaker and stop hammering it - add integration test for the trip behavior and failure-streak reset The circuit breaker was previously implemented but never wired into the request path, so circuit_state never opened in production. It remains config-gated via circuit_breaker_enabled (default threshold 5, 60s reset).
- core/base: extract _build_identification_manager() helper, removing the duplicated IdentificationManager construction in __init__ and process_input - providers/acrcloud: collapse the redundant nested try/except around response parsing into a single JSONDecodeError handler (the outer handler re-wrapped the inner ProviderError); behavior preserved - add ACRCloud identify_track tests (success, no-result code, 401, 429, invalid JSON) for the now-reachable provider
…tocol The identification loop and base protocol call identify_track(audio_segment), but ACRCloud expected raw bytes — so it could never run through the main loop. Normalize the input: read bytes from an AudioSegment's file_path (using its start_time), while still accepting raw bytes for direct/back-compat use. Add a test covering the segment path.
…ercion - initialize original_title/uploader/duration/_output_formats in __init__ so they always exist, removing the defensive getattr fallbacks at their reads - replace the two duplicated duration try/except blocks with a single _coerce_float() helper - preserve the _output_formats -> config.output_format fallback via `or` Behavior-preserving: both process_input branches already assigned these attributes before use, so the getattr defaults never triggered in practice.
Document the changes delivered in this PR (performance, correctness fixes, circuit-breaker wiring, refactors, dependency maintenance) and a prioritized, risk-classified backlog of follow-up improvements with file references, rationale, and test strategy.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR applies a set of targeted, low-risk optimizations and latent bug fixes across the identification pipeline (providers, factory wiring, rate limiting/circuit breaker integration), along with test coverage and documentation to lock in the intended behaviors.
Changes:
- Fixes two crash-on-use paths (ACRCloud provider construction via factory/env creds; Spotify downloader cover-art fetch within an existing event loop).
- Wires circuit-breaker outcome reporting into the identification loop and bounds rate-limiter metrics growth.
- Includes performance-minded refactors (O(n) track dedup, cached size reuse, hoisted Shazam cooldown, precompiled exporter regexes, concurrent Spotify detail fetch) plus new tests and an improvement-plan doc.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_providers_acrcloud.py | Adds response-parsing/error-mapping tests for ACRCloudProvider (no network). |
| tests/test_provider_factory.py | Verifies ACRCloud credential env wiring, host override, and factory caching/error cases. |
| tests/test_identification_circuit_breaker.py | Ensures the identification loop reports request outcomes so the breaker can trip/reset. |
| src/tracklistify/utils/rate_limiter.py | Adds bounded rate_limit_windows recording and a public record_result() API for breaker updates. |
| src/tracklistify/utils/identification.py | Records provider request outcomes (success vs exception) for circuit breaker behavior. |
| src/tracklistify/utils/constants.py | Introduces MAX_RATE_LIMIT_WINDOWS constant. |
| src/tracklistify/providers/spotify.py | Parallelizes independent track/audio-feature lookups with asyncio.gather. |
| src/tracklistify/providers/shazam.py | Hoists cooldown resolution into __init__ to avoid per-segment config parsing. |
| src/tracklistify/providers/factory.py | Fixes ACRCloud provider construction by reading required env vars and raising a clear ProviderError. |
| src/tracklistify/providers/acrcloud.py | Aligns identify_track with the AudioSegment protocol (still accepts raw bytes) and simplifies JSON error handling. |
| src/tracklistify/exporters/tracklist.py | Precompiles filename-sanitization regexes to avoid per-call compilation overhead. |
| src/tracklistify/downloaders/ytdlp.py | Removes unused yt-dlp options helper and now-unused import. |
| src/tracklistify/downloaders/spotify.py | Makes _set_metadata async and awaits cover fetch to avoid asyncio.run() inside a running loop. |
| src/tracklistify/core/track.py | Replaces O(n²) track dedup with an O(n) dict-based approach while preserving tie-break behavior. |
| src/tracklistify/core/base.py | Refactors app orchestration: centralizes manager construction, removes getattr fallbacks, adds float coercion helper, uses segment-size constant. |
| src/tracklistify/cache/invalidation.py | Reuses cached entry size when present to avoid repeated JSON serialization work. |
| docs/IMPROVEMENT_PLAN.md | Adds a technical improvement plan documenting shipped changes and a prioritized follow-up backlog. |
| docs/CHANGELOG.md | Documents the addition of the improvement plan. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A deep-dive code review (core orchestration, providers, rate limiter, cache, downloaders, exporters) followed by a focused, low-risk refactor. Changes split into three themes across three commits.
376 tests pass (8 new),
ruff check+ruff formatclean, no newvulturefindings.perf:lightweight cleanups (behavior-preserving)TrackMatcher.get_unique_tracksreplaced an O(n²) linear-scan +list.remove+ double-sort with a single dict pass + one sort. Tie-breaking (earliest-wins among equal confidence, highest-confidence selection) preserved.SizeStrategy.update_metadatano longer re-serializes the value viajson.dumpson every access; it reuses the size computed at set-time.__init__.MIN_SEGMENT_FILE_SIZEconstant instead of the literal1000; precompiled exporter sanitization regexes; removed unusedget_ydl_opts(); parallelized Spotifyget_track_detailswithasyncio.gather.fix:two latent crash-on-use bugsACRCloudProvider()was constructed with no args, raisingTypeErrorthe momentacrcloudwas selected. Now readsTRACKLISTIFY_ACR_ACCESS_KEY/_ACCESS_SECRET(and optional_HOST) from the environment (the documented design) and raises a clearProviderErrorwhen missing._set_metadatacalledasyncio.run()while already inside the runningdownload()event loop, raisingRuntimeErrorwhenever a track had cover art. Made_set_metadataasync andawaits the cover fetch.feat:circuit-breaker wiring + bounded metricscircuit_statenever opened. The identification loop now reports each provider request outcome (aNone/no-match result still counts as success; an exception is a failure), so repeated provider failures trip the breaker and stop hammering it. Remains config-gated viacircuit_breaker_enabled(default threshold 5, 60s reset).rate_limit_windowstoMAX_RATE_LIMIT_WINDOWSso long-running processes don't grow the metrics list without bound.Tests
tests/test_provider_factory.py— ACRCloud credential wiring (missing/partial creds raise, env creds construct, host override, caching, unknown provider).tests/test_identification_circuit_breaker.py— breaker trips after repeated failures (and stops calling the provider);record_resultresets the failure streak on success.Notes / out of scope
mutagenprobe to a thread was considered but reverted:split_audiois synchronous and making it async would ripple into callers/tests — too much regression surface for an optional item.ruff formatdrift in unrelated files (cli.py,test_error_handling.py,test_logger.py) was left untouched.