chore: migrate to mediavocab 1.0#18
Conversation
audiobooker.converters.audiobook_to_release() has been implicit- importing mediavocab since it was added; this commit makes the relationship explicit: - mediavocab>=0.1.0 in [project] dependencies (was unlisted). - pytest declared as a `test` optional extra so `pip install audiobooker[test]` works without surprises. - CHANGELOG entry. 115 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Populate Release.release_date from AudioBook.year as IsoDate-compatible YYYY. - Stamp Release.license = "public_domain" for LibriVox content so the Release.parsed_license property resolves to an open License. - README + docs: mention mediavocab as a hard runtime dep, drop stale requests-cache references (the PR replaced CachedSession with a plain requests.Session), and show a search → typed Release example. - New examples/mediavocab_release.py demonstrating parsed_license filtering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LibriVox now emits one AudioBook per book (instead of one per section) with full per-section Chapter offsets/titles, deduplicated reader cast, librivox_id external id, content_genres, codec/bitrate, audio_language, and total runtime from the API. LoyalBooks similarly aggregates RSS entries into chapters of a single book. The audiobook_to_release converter now maps: - AudioBook.chapters → Release.chapters (offset, end, title) - AudioBook.narrators[] → Work.credits (PERFORMER, deduped) - AudioBook.genres → Work.content_genres - AudioBook.external_ids → Work.external_ids / Release.external_ids - AudioBook.codec/bitrate → Release.codec/bitrate - AudioBook.language → Release.audio_language - LoyalBooks → license="public_domain" base.AudioBook gains chapters, narrators, genres, codec, bitrate, external_ids fields; AudioBookChapter is exported. The example script now shows off the rich Release output end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ruff F/B sweep: add __all__ exports, drop unused imports, remove F841 - fix GoldenAudioBooks scraper: domain moved to goldenaudiobooks.com (sitemap_index.xml); _iter_post_sitemaps was filtering leaf URLs by index-only substring and yielded nothing - harden Librivox _api_get: treat HTTP 500 / decode errors as empty result so combined queries (e.g. title=^X & author=Y) don't raise
- Use `as X` re-export aliases on all public imports in __init__.py to satisfy ruff F401 (unused-import on intentional re-exports) - Reconcile narrator/narrators when both supplied: narrators list is authoritative, narrator singular always mirrors narrators[0] - genres=list(genres) in librivox scrapper to avoid sharing the same mutable list object between tags and genres - Remove hardcoded bitrate="128" from LibriVox (serves both 64 and 128 kbps) - Simplify stable_id() guard in converters (always returns non-empty string) - Fix stale "cached session" wording in docs/api.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the reference pattern for cassette-backed parser tests: - test/conftest.py — vcr_config + vcr_cassette_dir fixtures - test/test_librivox_vcr.py — one test per public Librivox method - test/cassettes/ — recorded YAML cassettes (offline replay) - .github/workflows/nightly-live.yml — daily re-record vs live API - pyproject.toml — vcrpy + pytest-vcr in [test] extra Replays in 0.7 s offline; nightly job catches upstream drift within 24h. Pattern to be replicated across remaining scrapers and companion repos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevents shared mutable reference between tags and genres fields so mutations to one list do not silently affect the other. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Allow per-instance Session injection on AudioBookSource subclasses, add audiobooker.transport.default_session() honouring the AUDIOBOOKER_TRANSPORT=curl_cffi env var, and expose a [stealth] extra that pulls in curl_cffi for TLS-fingerprint impersonation. Class-level default session is preserved so existing callers keep working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s base Adds 100+ tests covering audiobook_to_release conversion, the Click CLI surface (search/index/cache subcommands), the parallel search orchestrator with timeout and dedup, AudioBook/BookAuthor/Narrator equality and narrator sync, utils HTTP/sitemap/score edge cases, and AudioBookSource base methods. Raises overall package coverage from 53% to 76%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…be scraper Adds offline tests for the cache module's download progress/failure paths, play with cached and uncached files, _open_file across platforms, the argparse main(), and the _find_book index/Librivox fallback chain. Adds tests for the index CLI (build/update/stats/follow/unfollow/list/search), IndexedSource delegation, _resolve_sources name normalisation, and the _followed_as_sources YouTube instantiation path. Adds offline tests for youtube.py covering _length_to_seconds, _parse_name, _parse_video_item (both old videoRenderer and new lockupViewModel layouts), _parse_playlist_video, extract_yt_metadata (authors/narrator/year/hashtags/ suffix stripping), _video_to_book metadata-resolution rules, _YtSourceMixin search methods, iterate_all on channel/playlist sources (min_runtime and title_blacklist filters), the pre-configured TheCybrarian / HorrorBabble / TheDustyTome subclasses, and the tutubo-backed _iter_channel_videos / _iter_playlist_videos parsers with mocked initial_data. Raises overall package coverage from 76% to 94%. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds offline tests for AudioAnarchy / DarkerProjects / GoldenAudioBooks / HPTalesAudioBooks / StephenKingAudioBooks / Librivox / LoyalBooks covering the get_soup-returns-None branches, parse failures (missing h1/content, no streams ParseErrorException, Harry Potter / Stephen Fry narrator branch), LoyalBooks calc_runtime invalid input, from_rss with no chapters and with duplicate authors, search_by_tag genre matching with absolute/relative hrefs, iterate_popular and iterate_all RSS exception handling, and Librivox iterate_all on empty API responses. Raises overall package coverage from 94% to 96%, exceeding the 95% target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…out a subcommand Previously fell through to args.query lookup which raised AttributeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… cassettes Without `install_extras: test`, vcrpy/pytest-vcr are absent from the coverage environment; pytest.mark.vcr is treated as unknown and tests make unguarded live HTTP calls that return None, causing all test_loyalbooks_vcr assertions to fail. Also tighten narrator/narrators reconciliation in AudioBook.__post_init__ so that when both fields are supplied and diverge the list stays consistent (narrators list is authoritative, singular is set to [0]). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The reusable coverage workflow installs install_extras as a raw pip argument; 'test' was being treated as a package name rather than the project's [test] optional-dependency group. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents indefinite hangs on network stalls, addressing CodeRabbit feedback on PR #14. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, doc links AI-Generated Change: - Model: claude-sonnet-4-6 - Intent: replace wall-of-text README with install/quickstart/tables/links structure - Impact: rewrote README.md; removed marketing prose; added [stealth]/[youtube]/[test] extras explained; added transport, mediavocab, CLI reference sections; links to /docs and /examples - Verified via: manual review against source code (transport.py, converters.py, cli.py, search.py) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s/README.md AI-Generated Change: - Model: claude-sonnet-4-6 - Intent: fill documentation gaps after mediavocab integration and transport refactor - Impact: created docs/converters.md (audiobook_to_release field mapping with citations); created docs/transport.md (default_session, AUDIOBOOKER_TRANSPORT, per-instance injection); created docs/getting-started.md (install, first search, key objects); updated docs/README.md to list all pages and renumbered examples table - Verified via: manual review against converters.py, transport.py, scrappers/__init__.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AI-Generated Change: - Model: claude-sonnet-4-6 - Intent: replace scattered ad-hoc examples with a learnable numbered sequence - Impact: added examples/01_quickstart.py through 10_cli_pipeline.py; each file has a docstring, runs as-is, and prints meaningful output; covers LibriVox search, unified search, author filter, cache download/play, mediavocab conversion, custom session/curl_cffi, YouTube sources, advanced index with IndexedSource, and a shell-callable pipeline - Verified via: manual review against base.py, cache.py, converters.py, index.py, transport.py, scrappers/youtube.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bump pin: mediavocab>=1.0.0
- rename PlaybackModality → PlaybackType, infer_modality →
infer_playback_type, Signals.modality / provider ClassVar →
playback_type
- adapt to 1.0 API tightenings where present: Work.extra /
Entity.extra now Dict[str, str]; Entity(ORGANISATION) requires
org_kind; Work country split into production_country /
publication_country / broadcaster_country; Release.variant_kind
removed (Work-only); VariantKind.{REGIONAL,DELUXE,BOOTLEG,…} moved
to ReleasePackaging; Stream.media_type → Stream.kind; programme
formats (news / talk_show / sports / documentary / concert) no
longer in genre constants — use literal strings or ProgrammeFormat.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR integrates mediavocab release conversion, refactors HTTP session management from requests-cache to a pluggable transport layer (with optional curl_cffi stealth support), restructures LibriVox and LoyalBooks scrapers to emit one AudioBook per book/feed with chapter aggregation, and expands the AudioBook dataclass with narrators, chapters, genres, and metadata fields. The changes include comprehensive documentation updates, 10 new example scripts, and extensive test coverage including VCR cassettes. ChangesCore Data Model & Types
HTTP Session & Transport Layer
Mediavocab Converter
Scraper Architecture Changes
CLI & Cache
Documentation
Example Scripts
Test Suite
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
|
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (10)
test/test_utils_extra.py (1)
25-31: ⚡ Quick winConsider testing SSL-specific exception types.
The test name suggests SSL-specific fallback, but the mock uses a generic
Exception("ssl"). Ifget_html()is intended to retry only SSL-related errors withverify=False, consider using an actual SSL exception type (e.g.,requests.exceptions.SSLError) to ensure the test accurately reflects the intended behavior and doesn't pass if the implementation changes to catch all exceptions indiscriminately.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/test_utils_extra.py` around lines 25 - 31, The test test_falls_back_to_verify_false currently raises a generic Exception("ssl") which doesn't ensure get_html only retries on SSL errors; update the mocked side_effect to raise the specific requests.exceptions.SSLError (import it in the test) for the first call on AudioBookSource.session.get and keep the second response as ok, so the assertion self.assertEqual(get_html("http://x"), "<html2>") verifies retry behavior only on SSLError and not on all exceptions.audiobooker/transport.py (1)
35-38: ⚡ Quick winConsider narrower exception handling or logging for User-Agent setup.
The bare
except Exception: passon line 37 silently suppresses any error when setting the User-Agent header for curl_cffi sessions. While this defensive approach ensures the function always returns a session, it could hide genuine issues like API incompatibilities or attribute errors. Consider either:
- Narrowing to specific exceptions (e.g.,
AttributeError,KeyError)- Adding minimal logging (e.g.,
logging.debug) so failures can be diagnosed🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@audiobooker/transport.py` around lines 35 - 38, Replace the broad silent except with targeted handling: catch only expected errors (e.g., AttributeError, KeyError) around session.headers.update({"User-Agent": random_user_agent()}) and emit a minimal debug log on failure (use logging.getLogger(__name__).debug with context and the exception) so legitimate issues are visible while preserving the fallback behavior; keep the call to random_user_agent() and the update inside the try block and do not change the function's return behavior.audiobooker/scrappers/hpaudiotales.py (1)
11-11: 💤 Low valueConsider more specific type hint for
session.The type hint
objectis very generic. While it accommodates bothrequests.Sessionandcurl_cffi.requests.Session(which don't share a common base), consider usingOptional[requests.Session]or aProtocolif you want to document the expected interface more clearly. That said,objectworks fine for duck-typed compatibility.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@audiobooker/scrappers/hpaudiotales.py` at line 11, The module-level variable session is currently typed as the generic object; change it to a more specific type to document the expected interface — for example use Optional[requests.Session] by importing Optional from typing and requests (replace "session: object = None" with "session: Optional[requests.Session] = None"), or if you must support both requests.Session and curl_cffi.requests.Session define a small Protocol (e.g., SessionProtocol with request/get/post attributes) and type session as Optional[SessionProtocol]; update imports accordingly.examples/09_advanced_index.py (1)
13-13: 💤 Low valueConsider using
tempfile.mkdtemp()for cross-platform compatibility.The hardcoded
/tmp/audiobooker_demo.dbpath works on Unix-like systems but won't work on Windows. For a more portable example, consider using Python'stempfilemodule:import tempfile DB_PATH = os.path.join(tempfile.gettempdir(), "audiobooker_demo.db")That said, for a quick demo script this is acceptable as-is.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/09_advanced_index.py` at line 13, The hardcoded DB_PATH constant uses "/tmp/audiobooker_demo.db" which is not cross-platform; update the DB_PATH initialization (symbol: DB_PATH) to build the path using Python's tempfile utilities (e.g., tempfile.gettempdir() or tempfile.mkdtemp()) combined with os.path.join so the example works on Windows and Unix-like systems and remains a portable demo.docs/README.md (1)
9-9: 💤 Low valueOptional: Use official "YouTube" capitalization.
The official brand name is "YouTube" (capital T), not "Youtube".
📝 Proposed fix
-| [youtube.md](youtube.md) | `YoutubeChannelSource`, `YoutubePlaylistSource`, `TheCybrarian`, `HorrorBabble` | +| [youtube.md](youtube.md) | YouTube sources: `YoutubeChannelSource`, `YoutubePlaylistSource`, `TheCybrarian`, `HorrorBabble` |Also apply to line 27:
-| `08_youtube_audiobooks.py` | YouTube sources (requires `[youtube]` extra) | +| `08_youtube_audiobooks.py` | YouTube sources (requires `[youtube]` extra) |🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/README.md` at line 9, Update the branding to use the official "YouTube" capitalization: change the link text and any occurrences of the word "Youtube" in the README to "YouTube" (e.g., the markdown link text [youtube.md] should become [YouTube.md] or simply display "YouTube" while keeping the file path if required) and update symbol names shown in the list such as `YoutubeChannelSource` and `YoutubePlaylistSource` to `YouTubeChannelSource` and `YouTubePlaylistSource`; also make the same capitalization correction for the related mention on the other occurrence referenced (the one around the same block where `TheCybrarian` and `HorrorBabble` are listed).examples/07_custom_session.py (1)
25-35: ⚡ Quick winRemove the misleading
except ImportErrorblock; detect fallback from the resolved session backend instead.The
default_session()function silently handlescurl_cffiimport failures internally and always returns a session (eithercurl_cffi.requests.Sessionorrequests.Session). Theexcept ImportErrorblock in the example will never execute. The example already has the correct fallback detection viatype(session_b).__module__—use that exclusively instead of the unreachable exception handler.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/07_custom_session.py` around lines 25 - 35, Remove the unreachable try/except around default_session(); default_session() always returns a session object, so delete the ImportError handler and rely on inspecting type(session_b).__module__ (the existing backend detection) to determine whether curl_cffi or requests is in use; leave the creation of session_b via default_session(), the backend variable computation, and the Librivox(session=session_b) usage intact so the example shows the actual fallback based on the resolved session backend.test/test_stephenkingaudiobooks_vcr.py (1)
27-31: 💤 Low valueConsider adding streams assertion for consistency.
The
test_iterate_all_yields_typed_audiobookstest (line 23) asserts bothtitleandstreams, but this test only checkstitle. If streams are expected to be present for title search results, adding that assertion would improve test coverage and consistency.Suggested addition
assert isinstance(book, AudioBook) assert book.title + assert book.streams, "expected at least one stream URL"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/test_stephenkingaudiobooks_vcr.py` around lines 27 - 31, The test_search_by_title test only asserts book.title but should also assert that streams are present like the other test; update the test_search_by_title function to retrieve the first result from StephenKingAudioBooks().search_by_title("Carrie") (as it already does), then add an assertion that the returned object is an AudioBook and that book.streams is truthy (or has expected type/length) to mirror test_iterate_all_yields_typed_audiobooks and ensure streams are validated for title searches.test/test_hpaudiotales_vcr.py (2)
26-30: 💤 Low valueConsider adding assertion message for consistency.
Line 30 asserts
book.streamswithout a message, while line 23 includes the helpful message"expected at least one stream URL". Adding the same message here would improve consistency and test output clarity.Suggested consistency fix
assert book.title - assert book.streams + assert book.streams, "expected at least one stream URL"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/test_hpaudiotales_vcr.py` around lines 26 - 30, Add a consistent assertion message to the existing check of book.streams in the test_iterate_popular_yields_typed_audiobooks function: replace the bare assert book.streams with an assertion that includes the same helpful message used earlier (e.g., "expected at least one stream URL") so failures report a clear, consistent message referencing book.streams.
13-16: ⚡ Quick winConsider moving
_firsthelper toconftest.py.This helper function is duplicated across multiple VCR test modules (
test_darkerprojects_vcr.py,test_stephenkingaudiobooks_vcr.py). Moving it totest/conftest.pyas a shared fixture or utility function would eliminate duplication and improve maintainability.♻️ Proposed consolidation in conftest.py
Add to
test/conftest.py:def first_or_none(iterable): """Return the first item from an iterable, or None if empty.""" for item in iterable: return item return NoneThen import in test files:
+from conftest import first_or_none -def _first(it): - for x in it: - return x - return None def test_iterate_all_yields_typed_audiobooks(): - book = _first(HPTalesAudioBooks().iterate_all()) + book = first_or_none(HPTalesAudioBooks().iterate_all())🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/test_hpaudiotales_vcr.py` around lines 13 - 16, The small helper _first is duplicated across VCR tests; extract it into a shared test utility (e.g., conftest) and rename to a clearer name like first_or_none with the same behavior (return first item from an iterable or None). Update callers in test_hpaudiotales_vcr.py and the other VCR test modules to import and use first_or_none instead of defining their own _first functions to remove duplication and centralize the utility.README.md (1)
141-162: 💤 Low valueConsider adding a language identifier to the CLI reference code block.
The fenced code block starting at line 141 lacks a language identifier, which would improve syntax highlighting in rendered markdown.
📝 Suggested fix
-``` +```bash audiobooker search <query>Alternatively, use
```textif you prefer no syntax highlighting.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 141 - 162, The README code block that documents the CLI (starting with the line containing "audiobooker search <query>") is missing a fenced-code language tag; update the opening fence from ``` to ```bash (or ```text if you prefer no highlighting) so the CLI examples (e.g., "audiobooker search <query>", "audiobooker index build", "audiobooker cache download <query>") are rendered with appropriate syntax highlighting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@audiobooker/converters.py`:
- Line 66: The key construction key = (narrator.first_name.lower(),
narrator.last_name.lower()) can raise AttributeError if narrator.first_name or
narrator.last_name is None; update the code that builds this key (the key
variable in converters.py) to safely coerce None to an empty string (or fallback
string) before calling .lower() — e.g., use ( (narrator.first_name or
"").lower(), (narrator.last_name or "").lower() ) or use getattr(narrator,
"first_name", "")/getattr(narrator, "last_name", "") — ensuring the rest of the
code that uses key continues to expect lowercase strings.
- Around line 28-35: The helper functions _author_ref and _narrator_ref build
names with f-strings that can yield "None" when first_name or last_name is None;
update both functions to construct the display name by joining only non-empty,
non-None parts (e.g., collect [first_name, last_name] and join parts that are
truthy, then .strip()), and keep creating EntityRef(name=<joined_name>,
kind=EntityKind.PERSON); this ensures names omit "None" and handle missing
fields gracefully.
In `@audiobooker/scrappers/librivox.py`:
- Around line 53-66: _section_streams currently lets feedparser.parse fetch the
RSS URL itself, bypassing the injected requests session (sess) and its
proxies/certs/transport; change it to fetch the RSS using sess (e.g., sess.get
with appropriate headers, timeout and error handling), check response.ok (raise
or return empty on bad status), then pass the raw response content/text into
feedparser.parse (instead of the URL) so parsing uses the session's transport
and headers; keep using sess.headers.get("User-Agent") when setting request
headers and preserve the existing request_headers like "Connection": "close".
In `@docs/converters.md`:
- Line 3: Remove hard-coded line numbers in docs/converters.md and replace them
with symbol-based references or links to anchored locations; e.g., instead of
"audiobook_to_release() — audiobooker/converters.py:46" change to "see
audiobook_to_release() in audiobooker/converters.py" or link to an anchor for
the function. Update all occurrences that reference specific lines (the entries
for audiobook_to_release, and the ones at lines 44, 53, 59) to use function
names or stable anchors so the references remain correct as the code changes.
In `@examples/05_play_with_cache.py`:
- Around line 12-20: The example should guard before calling play(book,
stream=0): check the retrieved book's streams (e.g., book.streams or whichever
attribute holds playable URLs) and if it's empty or has no playable URL, print a
friendly message like "No streams available for this book." and skip the
play(...) call; otherwise proceed to print Cached and launch the player with
play(book, stream=0). Ensure you reference the same variables used in the
snippet (Librivox().search_by_title, book, is_cached, play) so the conditional
integrates in place of the unconditional play(...) call.
In `@examples/08_youtube_audiobooks.py`:
- Line 12: Update the typo in the user-facing error string printed in
examples/08_youtube_audiobooks.py: change the print(...) message that currently
says "tutubo not installed. Run: pip install audiobooker[youtube]" to correctly
spell "youtube" so it reads "youtube not installed. Run: pip install
audiobooker[youtube]"; locate and edit the print statement to fix the typo.
In `@pyproject.toml`:
- Line 20: The pyproject.toml dependency constraint for mediavocab is invalid
(currently "mediavocab>=1.0.0"); update the requirement to a satisfiable version
such as "mediavocab>=0.1.1" (or pin to an exact compatible version) by replacing
the existing mediavocab entry so the project can install the package from PyPI.
In `@test/test_cli.py`:
- Around line 116-119: The test mutates MagicMock class metadata by assigning to
src.__class__.__name__, which can leak across tests; instead create a dedicated
dummy class (e.g., class FakeSrc: pass) and set the mock instance's __class__ to
that class (src.__class__ = FakeSrc) or construct the double as a Mock of that
type (use Mock(spec=FakeSrc) or create_autospec) so you don't change MagicMock
global state; update both places where src.__class__.__name__ is set and keep
the patch to audiobooker.index._default_sources returning [src].
- Around line 240-242: TestCacheCmds.setUp creates a temporary directory
(self.tmp) but never removes it; add a corresponding tearDown method on the
TestCacheCmds test class that safely removes the temp directory (use
shutil.rmtree and guard with os.path.exists(self.tmp)) to prevent leaking temp
dirs between test runs; ensure tearDown handles exceptions (ignore if already
removed) and clears self.tmp so repeated runs/parallel tests don't collide.
In `@test/test_index_extra.py`:
- Around line 125-132: The test currently exits the patch.dict context before
calling idx._followed_as_sources(), so the "YouTube module unavailable" path
isn't exercised; move the call to idx._followed_as_sources() (the assertion
self.assertEqual(idx._followed_as_sources(), [])) inside the with
patch.dict(sys.modules, {"audiobooker.scrappers.youtube": None}): block (remove
the stray pass and any unnecessary importlib/mock lines) so the function is
invoked while the youtube module is patched to None, ensuring the
unavailable-module branch is tested.
In `@test/test_scrappers_edges.py`:
- Around line 224-235: The test currently lacks an assertion—update
test_search_by_tag_with_match so it asserts that the RSS parsing path was
invoked for the matched link: patch audiobooker.scrappers.loyalbooks.from_rss as
a mock (instead of return_value=iter([])) and after calling
list(LoyalBooks().search_by_tag("Horror")) assert that from_rss was called once
with the expected absolute book URL (e.g.
"http://www.loyalbooks.com/book/dracula") or otherwise assert the returned
iterable contains the expected book entry; refer to LoyalBooks.search_by_tag,
get_soup and from_rss to locate where to add the mock assertion.
In `@test/test_search.py`:
- Around line 74-79: The test_search_filters_min_score currently uses a fixture
that exactly matches the query so it never exercises the score-filtering path;
update the FakeSource fixture in test_search_filters_min_score to return a title
that will score below the min threshold (e.g., a clearly unrelated title instead
of "Dracula") and call search with an explicit min_score (or rely on the
function's default) so the returned low-score item is filtered out, then assert
that no result has that title; locate the test function name
test_search_filters_min_score, the FakeSource/_b fixture, and the
search/search_by_title call to make the change.
---
Nitpick comments:
In `@audiobooker/scrappers/hpaudiotales.py`:
- Line 11: The module-level variable session is currently typed as the generic
object; change it to a more specific type to document the expected interface —
for example use Optional[requests.Session] by importing Optional from typing and
requests (replace "session: object = None" with "session:
Optional[requests.Session] = None"), or if you must support both
requests.Session and curl_cffi.requests.Session define a small Protocol (e.g.,
SessionProtocol with request/get/post attributes) and type session as
Optional[SessionProtocol]; update imports accordingly.
In `@audiobooker/transport.py`:
- Around line 35-38: Replace the broad silent except with targeted handling:
catch only expected errors (e.g., AttributeError, KeyError) around
session.headers.update({"User-Agent": random_user_agent()}) and emit a minimal
debug log on failure (use logging.getLogger(__name__).debug with context and the
exception) so legitimate issues are visible while preserving the fallback
behavior; keep the call to random_user_agent() and the update inside the try
block and do not change the function's return behavior.
In `@docs/README.md`:
- Line 9: Update the branding to use the official "YouTube" capitalization:
change the link text and any occurrences of the word "Youtube" in the README to
"YouTube" (e.g., the markdown link text [youtube.md] should become [YouTube.md]
or simply display "YouTube" while keeping the file path if required) and update
symbol names shown in the list such as `YoutubeChannelSource` and
`YoutubePlaylistSource` to `YouTubeChannelSource` and `YouTubePlaylistSource`;
also make the same capitalization correction for the related mention on the
other occurrence referenced (the one around the same block where `TheCybrarian`
and `HorrorBabble` are listed).
In `@examples/07_custom_session.py`:
- Around line 25-35: Remove the unreachable try/except around default_session();
default_session() always returns a session object, so delete the ImportError
handler and rely on inspecting type(session_b).__module__ (the existing backend
detection) to determine whether curl_cffi or requests is in use; leave the
creation of session_b via default_session(), the backend variable computation,
and the Librivox(session=session_b) usage intact so the example shows the actual
fallback based on the resolved session backend.
In `@examples/09_advanced_index.py`:
- Line 13: The hardcoded DB_PATH constant uses "/tmp/audiobooker_demo.db" which
is not cross-platform; update the DB_PATH initialization (symbol: DB_PATH) to
build the path using Python's tempfile utilities (e.g., tempfile.gettempdir() or
tempfile.mkdtemp()) combined with os.path.join so the example works on Windows
and Unix-like systems and remains a portable demo.
In `@README.md`:
- Around line 141-162: The README code block that documents the CLI (starting
with the line containing "audiobooker search <query>") is missing a fenced-code
language tag; update the opening fence from ``` to ```bash (or ```text if you
prefer no highlighting) so the CLI examples (e.g., "audiobooker search <query>",
"audiobooker index build", "audiobooker cache download <query>") are rendered
with appropriate syntax highlighting.
In `@test/test_hpaudiotales_vcr.py`:
- Around line 26-30: Add a consistent assertion message to the existing check of
book.streams in the test_iterate_popular_yields_typed_audiobooks function:
replace the bare assert book.streams with an assertion that includes the same
helpful message used earlier (e.g., "expected at least one stream URL") so
failures report a clear, consistent message referencing book.streams.
- Around line 13-16: The small helper _first is duplicated across VCR tests;
extract it into a shared test utility (e.g., conftest) and rename to a clearer
name like first_or_none with the same behavior (return first item from an
iterable or None). Update callers in test_hpaudiotales_vcr.py and the other VCR
test modules to import and use first_or_none instead of defining their own
_first functions to remove duplication and centralize the utility.
In `@test/test_stephenkingaudiobooks_vcr.py`:
- Around line 27-31: The test_search_by_title test only asserts book.title but
should also assert that streams are present like the other test; update the
test_search_by_title function to retrieve the first result from
StephenKingAudioBooks().search_by_title("Carrie") (as it already does), then add
an assertion that the returned object is an AudioBook and that book.streams is
truthy (or has expected type/length) to mirror
test_iterate_all_yields_typed_audiobooks and ensure streams are validated for
title searches.
In `@test/test_utils_extra.py`:
- Around line 25-31: The test test_falls_back_to_verify_false currently raises a
generic Exception("ssl") which doesn't ensure get_html only retries on SSL
errors; update the mocked side_effect to raise the specific
requests.exceptions.SSLError (import it in the test) for the first call on
AudioBookSource.session.get and keep the second response as ok, so the assertion
self.assertEqual(get_html("http://x"), "<html2>") verifies retry behavior only
on SSLError and not on all exceptions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b743c13d-46ef-4c07-982d-b9c60b479068
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (79)
.github/workflows/coverage.yml.github/workflows/nightly-live.ymlCHANGELOG.mdREADME.mdaudiobooker/__init__.pyaudiobooker/base.pyaudiobooker/cache.pyaudiobooker/cli.pyaudiobooker/converters.pyaudiobooker/scrappers/__init__.pyaudiobooker/scrappers/darkerprojects.pyaudiobooker/scrappers/goldenaudiobooks.pyaudiobooker/scrappers/hpaudiotales.pyaudiobooker/scrappers/librivox.pyaudiobooker/scrappers/loyalbooks.pyaudiobooker/scrappers/youtube.pyaudiobooker/search.pyaudiobooker/transport.pyaudiobooker/utils.pydocs/README.mddocs/api.mddocs/converters.mddocs/getting-started.mddocs/sources.mddocs/transport.mdexamples/01_quickstart.pyexamples/02_search_all_sources.pyexamples/03_filter_by_author.pyexamples/04_download_and_cache.pyexamples/05_play_with_cache.pyexamples/06_convert_to_mediavocab.pyexamples/07_custom_session.pyexamples/08_youtube_audiobooks.pyexamples/09_advanced_index.pyexamples/10_cli_pipeline.pyexamples/mediavocab_release.pypyproject.tomlrequirements.txttest/cassettes/test_audioanarchy_vcr/test_iterate_all_yields_typed_audiobooks.yamltest/cassettes/test_audioanarchy_vcr/test_iterate_popular_yields_typed_audiobooks.yamltest/cassettes/test_darkerprojects_vcr/test_iterate_all_yields_typed_audiobooks.yamltest/cassettes/test_darkerprojects_vcr/test_iterate_popular_yields_typed_audiobooks.yamltest/cassettes/test_goldenaudiobooks_vcr/test_iterate_all_yields_typed_audiobooks.yamltest/cassettes/test_goldenaudiobooks_vcr/test_iterate_popular_yields_typed_audiobooks.yamltest/cassettes/test_hpaudiotales_vcr/test_iterate_all_yields_typed_audiobooks.yamltest/cassettes/test_hpaudiotales_vcr/test_iterate_popular_yields_typed_audiobooks.yamltest/cassettes/test_librivox_vcr/test_iterate_all_yields_typed_audiobooks.yamltest/cassettes/test_librivox_vcr/test_search_by_author.yamltest/cassettes/test_librivox_vcr/test_search_by_narrator.yamltest/cassettes/test_librivox_vcr/test_search_by_tag.yamltest/cassettes/test_librivox_vcr/test_search_by_title.yamltest/cassettes/test_loyalbooks_vcr/test_iterate_all_yields_typed_audiobooks.yamltest/cassettes/test_loyalbooks_vcr/test_iterate_popular_yields_typed_audiobooks.yamltest/cassettes/test_loyalbooks_vcr/test_search_by_author.yamltest/cassettes/test_loyalbooks_vcr/test_search_by_tag.yamltest/cassettes/test_loyalbooks_vcr/test_search_by_title.yamltest/cassettes/test_stephenkingaudiobooks_vcr/test_iterate_all_yields_typed_audiobooks.yamltest/cassettes/test_stephenkingaudiobooks_vcr/test_search.yamltest/cassettes/test_stephenkingaudiobooks_vcr/test_search_by_author.yamltest/cassettes/test_stephenkingaudiobooks_vcr/test_search_by_title.yamltest/conftest.pytest/test_audioanarchy_vcr.pytest/test_base_extra.pytest/test_cache_extra.pytest/test_cli.pytest/test_converters.pytest/test_darkerprojects_vcr.pytest/test_goldenaudiobooks_vcr.pytest/test_hpaudiotales_vcr.pytest/test_index_extra.pytest/test_librivox_vcr.pytest/test_loyalbooks_vcr.pytest/test_scrappers_base.pytest/test_scrappers_edges.pytest/test_search.pytest/test_stephenkingaudiobooks_vcr.pytest/test_transport.pytest/test_utils_extra.pytest/test_youtube.py
💤 Files with no reviewable changes (2)
- requirements.txt
- audiobooker/scrappers/youtube.py
| def _author_ref(author: BookAuthor) -> EntityRef: | ||
| name = f"{author.first_name} {author.last_name}".strip() | ||
| return EntityRef(name=name, kind=EntityKind.PERSON) | ||
|
|
||
|
|
||
| def _narrator_ref(narrator: AudiobookNarrator) -> EntityRef: | ||
| name = f"{narrator.first_name} {narrator.last_name}".strip() | ||
| return EntityRef(name=name, kind=EntityKind.PERSON) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Handle None values in name fields.
If first_name or last_name is None, the f-string will produce "None Smith" or "John None". Consider filtering out None values or using empty strings as fallback.
🛡️ Proposed fix
def _author_ref(author: BookAuthor) -> EntityRef:
- name = f"{author.first_name} {author.last_name}".strip()
+ name = f"{author.first_name or ''} {author.last_name or ''}".strip()
return EntityRef(name=name, kind=EntityKind.PERSON)
def _narrator_ref(narrator: AudiobookNarrator) -> EntityRef:
- name = f"{narrator.first_name} {narrator.last_name}".strip()
+ name = f"{narrator.first_name or ''} {narrator.last_name or ''}".strip()
return EntityRef(name=name, kind=EntityKind.PERSON)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _author_ref(author: BookAuthor) -> EntityRef: | |
| name = f"{author.first_name} {author.last_name}".strip() | |
| return EntityRef(name=name, kind=EntityKind.PERSON) | |
| def _narrator_ref(narrator: AudiobookNarrator) -> EntityRef: | |
| name = f"{narrator.first_name} {narrator.last_name}".strip() | |
| return EntityRef(name=name, kind=EntityKind.PERSON) | |
| def _author_ref(author: BookAuthor) -> EntityRef: | |
| name = f"{author.first_name or ''} {author.last_name or ''}".strip() | |
| return EntityRef(name=name, kind=EntityKind.PERSON) | |
| def _narrator_ref(narrator: AudiobookNarrator) -> EntityRef: | |
| name = f"{narrator.first_name or ''} {narrator.last_name or ''}".strip() | |
| return EntityRef(name=name, kind=EntityKind.PERSON) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@audiobooker/converters.py` around lines 28 - 35, The helper functions
_author_ref and _narrator_ref build names with f-strings that can yield "None"
when first_name or last_name is None; update both functions to construct the
display name by joining only non-empty, non-None parts (e.g., collect
[first_name, last_name] and join parts that are truthy, then .strip()), and keep
creating EntityRef(name=<joined_name>, kind=EntityKind.PERSON); this ensures
names omit "None" and handle missing fields gracefully.
| for narrator in narrators: | ||
| if not narrator: | ||
| continue | ||
| key = (narrator.first_name.lower(), narrator.last_name.lower()) |
There was a problem hiding this comment.
Potential AttributeError when narrator names are None.
If narrator.first_name or narrator.last_name is None, calling .lower() will raise an AttributeError.
🐛 Proposed fix
- key = (narrator.first_name.lower(), narrator.last_name.lower())
+ key = (
+ (narrator.first_name or "").lower(),
+ (narrator.last_name or "").lower(),
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| key = (narrator.first_name.lower(), narrator.last_name.lower()) | |
| key = ( | |
| (narrator.first_name or "").lower(), | |
| (narrator.last_name or "").lower(), | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@audiobooker/converters.py` at line 66, The key construction key =
(narrator.first_name.lower(), narrator.last_name.lower()) can raise
AttributeError if narrator.first_name or narrator.last_name is None; update the
code that builds this key (the key variable in converters.py) to safely coerce
None to an empty string (or fallback string) before calling .lower() — e.g., use
( (narrator.first_name or "").lower(), (narrator.last_name or "").lower() ) or
use getattr(narrator, "first_name", "")/getattr(narrator, "last_name", "") —
ensuring the rest of the code that uses key continues to expect lowercase
strings.
| def _section_streams(rss_url: str, session=None) -> list: | ||
| # NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT | ||
| # accept a ``requests.Session``. Injected sessions therefore do not | ||
| # apply to RSS fetches — only the User-Agent header is forwarded. | ||
| sess = session if session is not None else AudioBookSource.session | ||
| try: | ||
| ua = sess.headers.get("User-Agent") | ||
| except Exception: | ||
| ua = None | ||
| rss = feedparser.parse( | ||
| rss_url, | ||
| agent=ua, | ||
| request_headers={"Connection": "close"}, | ||
| ) |
There was a problem hiding this comment.
❓ Verification inconclusive
Does Python feedparser.parse accept raw bytes or a response body so an RSS feed can be fetched with requests/curl_cffi first and then parsed locally?
Yes—feedparser.parse accepts the fetched response body content locally, and it can be bytes. - The feedparser.parse first argument (url_file_stream_or_string) can be a URL, a local filename, or a raw string containing feed data, and the implementation docs explicitly note that both byte and text strings are accepted [1]. - The public docs also state feedparser can parse “a raw string containing feed data in any format” [2]. Practical implication for your workflow: 1) Use requests/curl_cffi to fetch the RSS/Atom body. 2) Pass the response content into feedparser.parse. Example: import feedparser import requests resp = requests.get(url, timeout=10) # resp.content is bytes; resp.text is str feed = feedparser.parse(resp.content) # or: feed = feedparser.parse(resp.text) If you already have HTTP response headers you may also pass them via response_headers (so feedparser can interpret encoding / content-location for relative URLs) [1]. One caution: if you use a file-like wrapper, prefer BytesIO for bytes (the docs mention wrapping untrusted strings in StringIO/BytesIO) [1], and there are reports that passing a text StringIO where bytes are expected can break parsing [3].
Citations:
- 1: https://github.com/kurtmckee/feedparser/blob/develop/feedparser/api.py
- 2: https://feedparser.readthedocs.io/en/stable/introduction.html
- 3:
io.StringIOnot working inparse()kurtmckee/feedparser#427
Script executed:
cat -n audiobooker/scrappers/librivox.py | sed -n '45,75p'Repository: TigreGotico/audiobooker
Repository: TigreGotico/audiobooker
Exit code: 0
stdout:
45 )
46 if resp.status_code >= 400:
47 return {}
48 return resp.json()
49 except Exception:
50 return {}
51
52
53 def _section_streams(rss_url: str, session=None) -> list:
54 # NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT
55 # accept a ``requests.Session``. Injected sessions therefore do not
56 # apply to RSS fetches — only the User-Agent header is forwarded.
57 sess = session if session is not None else AudioBookSource.session
58 try:
59 ua = sess.headers.get("User-Agent")
60 except Exception:
61 ua = None
62 rss = feedparser.parse(
63 rss_url,
64 agent=ua,
65 request_headers={"Connection": "close"},
66 )
67 return [
68 e["media_content"][0]["url"]
69 for e in rss.get("entries", [])
70 if e.get("media_content")
71 ]
72
73
74 def _build_book(k: dict, session=None) -> AudioBook:
75 """Convert one LibriVox API record into a single typed ``AudioBook``."""
Script executed:
rg -n "_section_streams" audiobooker/scrappers/librivox.py -A 2 -B 2Repository: TigreGotico/audiobooker
Repository: TigreGotico/audiobooker
Exit code: 0
stdout:
51-
52-
53:def _section_streams(rss_url: str, session=None) -> list:
54- # NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT
55- # accept a ``requests.Session``. Injected sessions therefore do not
--
74-def _build_book(k: dict, session=None) -> AudioBook:
75- """Convert one LibriVox API record into a single typed ``AudioBook``."""
76: rss_streams = _section_streams(k["url_rss"], session=session)
77-
78- chapters: list = []
Use the injected session for RSS fetches.
API calls honor session, but _section_streams() still lets feedparser make its own network request. This bypasses proxies, cookies, custom TLS, and the new curl_cffi transport. The web_search confirmed feedparser.parse accepts raw bytes/text content, so you can fetch the RSS with the injected session first and then parse locally.
Suggested fix
def _section_streams(rss_url: str, session=None) -> list:
- # NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT
- # accept a ``requests.Session``. Injected sessions therefore do not
- # apply to RSS fetches — only the User-Agent header is forwarded.
sess = session if session is not None else AudioBookSource.session
try:
- ua = sess.headers.get("User-Agent")
+ resp = sess.get(rss_url, timeout=30)
+ resp.raise_for_status()
except Exception:
- ua = None
- rss = feedparser.parse(
- rss_url,
- agent=ua,
- request_headers={"Connection": "close"},
- )
+ return []
+ rss = feedparser.parse(resp.content)
return [
e["media_content"][0]["url"]
for e in rss.get("entries", [])
if e.get("media_content")
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _section_streams(rss_url: str, session=None) -> list: | |
| # NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT | |
| # accept a ``requests.Session``. Injected sessions therefore do not | |
| # apply to RSS fetches — only the User-Agent header is forwarded. | |
| sess = session if session is not None else AudioBookSource.session | |
| try: | |
| ua = sess.headers.get("User-Agent") | |
| except Exception: | |
| ua = None | |
| rss = feedparser.parse( | |
| rss_url, | |
| agent=ua, | |
| request_headers={"Connection": "close"}, | |
| ) | |
| def _section_streams(rss_url: str, session=None) -> list: | |
| sess = session if session is not None else AudioBookSource.session | |
| try: | |
| resp = sess.get(rss_url, timeout=30) | |
| resp.raise_for_status() | |
| except Exception: | |
| return [] | |
| rss = feedparser.parse(resp.content) | |
| return [ | |
| e["media_content"][0]["url"] | |
| for e in rss.get("entries", []) | |
| if e.get("media_content") | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@audiobooker/scrappers/librivox.py` around lines 53 - 66, _section_streams
currently lets feedparser.parse fetch the RSS URL itself, bypassing the injected
requests session (sess) and its proxies/certs/transport; change it to fetch the
RSS using sess (e.g., sess.get with appropriate headers, timeout and error
handling), check response.ok (raise or return empty on bad status), then pass
the raw response content/text into feedparser.parse (instead of the URL) so
parsing uses the session's transport and headers; keep using
sess.headers.get("User-Agent") when setting request headers and preserve the
existing request_headers like "Connection": "close".
| @@ -0,0 +1,76 @@ | |||
| # mediavocab Converters | |||
|
|
|||
| `audiobook_to_release()` — `audiobooker/converters.py:46` | |||
There was a problem hiding this comment.
Hard-coded line numbers will become stale.
Lines 3, 44, 53, and 59 reference specific line numbers in audiobooker/converters.py. As the code evolves, these line numbers will drift and become inaccurate, potentially confusing readers.
Consider using function/symbol references instead (e.g., "see audiobook_to_release() in audiobooker/converters.py") or linking to anchored documentation that auto-updates.
Suggested approach
-`audiobook_to_release()` — `audiobooker/converters.py:46`
+`audiobook_to_release()` — see `audiobooker/converters.py`
-## Credits — `audiobooker/converters.py:55`
+## Credits
-## License detection — `audiobooker/converters.py:96`
+## License detection
-## Chapter conversion — `audiobooker/converters.py:31`
+## Chapter conversion🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/converters.md` at line 3, Remove hard-coded line numbers in
docs/converters.md and replace them with symbol-based references or links to
anchored locations; e.g., instead of "audiobook_to_release() —
audiobooker/converters.py:46" change to "see audiobook_to_release() in
audiobooker/converters.py" or link to an anchor for the function. Update all
occurrences that reference specific lines (the entries for audiobook_to_release,
and the ones at lines 44, 53, 59) to use function names or stable anchors so the
references remain correct as the code changes.
| book = next(Librivox().search_by_title("The Yellow Wallpaper"), None) | ||
| if book is None: | ||
| print("Book not found.") | ||
| else: | ||
| print(f"Title: {book.title!r}") | ||
| print(f"Cached: {is_cached(book)}") | ||
| print("Playing stream 0 (downloads if needed)...") | ||
| play(book, stream=0) | ||
| print("Player launched.") |
There was a problem hiding this comment.
Guard the example against empty streams.
This calls play(book, stream=0) unconditionally. If the lookup returns metadata without a playable URL, the example raises instead of printing a friendly message.
Suggested fix
book = next(Librivox().search_by_title("The Yellow Wallpaper"), None)
if book is None:
print("Book not found.")
+elif not book.streams:
+ print(f"Title: {book.title!r}")
+ print("No playable streams were found.")
else:
print(f"Title: {book.title!r}")
print(f"Cached: {is_cached(book)}")
print("Playing stream 0 (downloads if needed)...")
play(book, stream=0)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| book = next(Librivox().search_by_title("The Yellow Wallpaper"), None) | |
| if book is None: | |
| print("Book not found.") | |
| else: | |
| print(f"Title: {book.title!r}") | |
| print(f"Cached: {is_cached(book)}") | |
| print("Playing stream 0 (downloads if needed)...") | |
| play(book, stream=0) | |
| print("Player launched.") | |
| book = next(Librivox().search_by_title("The Yellow Wallpaper"), None) | |
| if book is None: | |
| print("Book not found.") | |
| elif not book.streams: | |
| print(f"Title: {book.title!r}") | |
| print("No playable streams were found.") | |
| else: | |
| print(f"Title: {book.title!r}") | |
| print(f"Cached: {is_cached(book)}") | |
| print("Playing stream 0 (downloads if needed)...") | |
| play(book, stream=0) | |
| print("Player launched.") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/05_play_with_cache.py` around lines 12 - 20, The example should
guard before calling play(book, stream=0): check the retrieved book's streams
(e.g., book.streams or whichever attribute holds playable URLs) and if it's
empty or has no playable URL, print a friendly message like "No streams
available for this book." and skip the play(...) call; otherwise proceed to
print Cached and launch the player with play(book, stream=0). Ensure you
reference the same variables used in the snippet (Librivox().search_by_title,
book, is_cached, play) so the conditional integrates in place of the
unconditional play(...) call.
| src = MagicMock() | ||
| src.iterate_all.return_value = iter([_book("Built")]) | ||
| src.__class__.__name__ = "FakeSrc" | ||
| with patch("audiobooker.index._default_sources", return_value=[src]): |
There was a problem hiding this comment.
Avoid mutating MagicMock class metadata globally
At Line 118 and Line 127, src.__class__.__name__ = "FakeSrc" mutates class-level state on MagicMock, which can leak into unrelated tests and make failures order-dependent.
Suggested safe test-double pattern
- src = MagicMock()
- src.iterate_all.return_value = iter([_book("Built")])
- src.__class__.__name__ = "FakeSrc"
+ class FakeSrc:
+ pass
+ src = FakeSrc()
+ src.iterate_all = MagicMock(return_value=iter([_book("Built")]))- src = MagicMock()
- src.iterate_all.return_value = iter([_book("Up")])
- src.__class__.__name__ = "FakeSrc"
+ class FakeSrc:
+ pass
+ src = FakeSrc()
+ src.iterate_all = MagicMock(return_value=iter([_book("Up")]))Also applies to: 125-129
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/test_cli.py` around lines 116 - 119, The test mutates MagicMock class
metadata by assigning to src.__class__.__name__, which can leak across tests;
instead create a dedicated dummy class (e.g., class FakeSrc: pass) and set the
mock instance's __class__ to that class (src.__class__ = FakeSrc) or construct
the double as a Mock of that type (use Mock(spec=FakeSrc) or create_autospec) so
you don't change MagicMock global state; update both places where
src.__class__.__name__ is set and keep the patch to
audiobooker.index._default_sources returning [src].
| def setUp(self): | ||
| self.tmp = tempfile.mkdtemp() | ||
|
|
There was a problem hiding this comment.
Clean up temporary cache dirs created in tests
TestCacheCmds.setUp creates a temp directory at Line 241, but the class has no teardown cleanup. This can leak files across runs and slow CI workers over time.
Minimal cleanup addition
class TestCacheCmds(unittest.TestCase):
def setUp(self):
self.tmp = tempfile.mkdtemp()
+
+ def tearDown(self):
+ import shutil
+ shutil.rmtree(self.tmp, ignore_errors=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def setUp(self): | |
| self.tmp = tempfile.mkdtemp() | |
| def setUp(self): | |
| self.tmp = tempfile.mkdtemp() | |
| def tearDown(self): | |
| import shutil | |
| shutil.rmtree(self.tmp, ignore_errors=True) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/test_cli.py` around lines 240 - 242, TestCacheCmds.setUp creates a
temporary directory (self.tmp) but never removes it; add a corresponding
tearDown method on the TestCacheCmds test class that safely removes the temp
directory (use shutil.rmtree and guard with os.path.exists(self.tmp)) to prevent
leaking temp dirs between test runs; ensure tearDown handles exceptions (ignore
if already removed) and clears self.tmp so repeated runs/parallel tests don't
collide.
| with patch.dict(sys.modules, | ||
| {"audiobooker.scrappers.youtube": None}): | ||
| # Force the import to fail | ||
| import importlib | ||
| # Easier: directly mock the function | ||
| pass | ||
| # Empty list either way | ||
| self.assertEqual(idx._followed_as_sources(), []) |
There was a problem hiding this comment.
Move the assertion inside the patched import scope.
idx._followed_as_sources() runs after patch.dict(...) exits, so this test does not actually cover the “YouTube module unavailable” path.
Suggested fix
- with patch.dict(sys.modules,
- {"audiobooker.scrappers.youtube": None}):
- # Force the import to fail
- import importlib
- # Easier: directly mock the function
- pass
- # Empty list either way
- self.assertEqual(idx._followed_as_sources(), [])
+ with patch.dict(sys.modules,
+ {"audiobooker.scrappers.youtube": None}):
+ self.assertEqual(idx._followed_as_sources(), [])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with patch.dict(sys.modules, | |
| {"audiobooker.scrappers.youtube": None}): | |
| # Force the import to fail | |
| import importlib | |
| # Easier: directly mock the function | |
| pass | |
| # Empty list either way | |
| self.assertEqual(idx._followed_as_sources(), []) | |
| with patch.dict(sys.modules, | |
| {"audiobooker.scrappers.youtube": None}): | |
| self.assertEqual(idx._followed_as_sources(), []) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/test_index_extra.py` around lines 125 - 132, The test currently exits
the patch.dict context before calling idx._followed_as_sources(), so the
"YouTube module unavailable" path isn't exercised; move the call to
idx._followed_as_sources() (the assertion
self.assertEqual(idx._followed_as_sources(), [])) inside the with
patch.dict(sys.modules, {"audiobooker.scrappers.youtube": None}): block (remove
the stray pass and any unnecessary importlib/mock lines) so the function is
invoked while the youtube module is patched to None, ensuring the
unavailable-module branch is tested.
| def test_search_by_tag_with_match(self): | ||
| # 'Horror' matches 'Horror_and_Supernatural_Fiction' | ||
| with patch("audiobooker.scrappers.loyalbooks.get_soup", | ||
| return_value=_soup( | ||
| '<a href="/book/dracula"></a>' | ||
| '<a href="http://www.loyalbooks.com/book/dracula"></a>' | ||
| '<a href="/other"></a>' | ||
| )), \ | ||
| patch("audiobooker.scrappers.loyalbooks.from_rss", | ||
| return_value=iter([])): | ||
| list(LoyalBooks().search_by_tag("Horror")) | ||
|
|
There was a problem hiding this comment.
test_search_by_tag_with_match currently has no verifiable assertion.
Right now it passes as long as no exception is raised, so it won’t catch regressions in the “matched tag → RSS parsing invoked” path.
Suggested test tightening
def test_search_by_tag_with_match(self):
# 'Horror' matches 'Horror_and_Supernatural_Fiction'
- with patch("audiobooker.scrappers.loyalbooks.get_soup",
- return_value=_soup(
- '<a href="/book/dracula"></a>'
- '<a href="http://www.loyalbooks.com/book/dracula"></a>'
- '<a href="/other"></a>'
- )), \
- patch("audiobooker.scrappers.loyalbooks.from_rss",
- return_value=iter([])):
- list(LoyalBooks().search_by_tag("Horror"))
+ with patch("audiobooker.scrappers.loyalbooks.get_soup",
+ return_value=_soup(
+ '<a href="/book/dracula"></a>'
+ '<a href="http://www.loyalbooks.com/book/dracula"></a>'
+ '<a href="/other"></a>'
+ )), \
+ patch("audiobooker.scrappers.loyalbooks.from_rss",
+ return_value=iter([])) as from_rss_mock:
+ list(LoyalBooks().search_by_tag("Horror"))
+ self.assertGreaterEqual(from_rss_mock.call_count, 1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_search_by_tag_with_match(self): | |
| # 'Horror' matches 'Horror_and_Supernatural_Fiction' | |
| with patch("audiobooker.scrappers.loyalbooks.get_soup", | |
| return_value=_soup( | |
| '<a href="/book/dracula"></a>' | |
| '<a href="http://www.loyalbooks.com/book/dracula"></a>' | |
| '<a href="/other"></a>' | |
| )), \ | |
| patch("audiobooker.scrappers.loyalbooks.from_rss", | |
| return_value=iter([])): | |
| list(LoyalBooks().search_by_tag("Horror")) | |
| def test_search_by_tag_with_match(self): | |
| # 'Horror' matches 'Horror_and_Supernatural_Fiction' | |
| with patch("audiobooker.scrappers.loyalbooks.get_soup", | |
| return_value=_soup( | |
| '<a href="/book/dracula"></a>' | |
| '<a href="http://www.loyalbooks.com/book/dracula"></a>' | |
| '<a href="/other"></a>' | |
| )), \ | |
| patch("audiobooker.scrappers.loyalbooks.from_rss", | |
| return_value=iter([])) as from_rss_mock: | |
| list(LoyalBooks().search_by_tag("Horror")) | |
| self.assertGreaterEqual(from_rss_mock.call_count, 1) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/test_scrappers_edges.py` around lines 224 - 235, The test currently
lacks an assertion—update test_search_by_tag_with_match so it asserts that the
RSS parsing path was invoked for the matched link: patch
audiobooker.scrappers.loyalbooks.from_rss as a mock (instead of
return_value=iter([])) and after calling
list(LoyalBooks().search_by_tag("Horror")) assert that from_rss was called once
with the expected absolute book URL (e.g.
"http://www.loyalbooks.com/book/dracula") or otherwise assert the returned
iterable contains the expected book entry; refer to LoyalBooks.search_by_tag,
get_soup and from_rss to locate where to add the mock assertion.
| def test_search_filters_min_score(self): | ||
| # Yield title that's nothing like the query | ||
| src = FakeSource([_b("Dracula")]) | ||
| # Request through `search` will go to `search_by_title` (FakeSource.search) | ||
| results = list(search("Dracula", sources=[src])) | ||
| self.assertTrue(any(r.title == "Dracula" for r in results)) |
There was a problem hiding this comment.
test_search_filters_min_score never hits the filter path.
The fixture on Line 76 is an exact title match for the query on Line 78, so this still passes if score filtering is removed. Please make the returned title intentionally fall below the threshold and assert it is excluded.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/test_search.py` around lines 74 - 79, The test_search_filters_min_score
currently uses a fixture that exactly matches the query so it never exercises
the score-filtering path; update the FakeSource fixture in
test_search_filters_min_score to return a title that will score below the min
threshold (e.g., a clearly unrelated title instead of "Dracula") and call search
with an explicit min_score (or rely on the function's default) so the returned
low-score item is filtered out, then assert that no result has that title;
locate the test function name test_search_filters_min_score, the FakeSource/_b
fixture, and the search/search_by_title call to make the change.
Summary
mediavocab>=1.0.0pinPlaybackModality→PlaybackType,infer_modality→infer_playback_type,Signals.modality→Signals.playback_type, providermodalityClassVar →playback_type)Work.extra/Entity.extranowDict[str, str];Entity(ORGANISATION)requiresorg_kind;Work.countrysplit intoproduction_country/publication_country/broadcaster_country;Release.variant_kindremoved (Work-only);VariantKind.{REGIONAL,DELUXE,BOOTLEG,…}moved toReleasePackaging;Stream.media_type→Stream.kind; programme-format genre constants removed — use literal strings orProgrammeFormat.Test plan
pytest -x -qpasses locally against mediavocab 1.0.0🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation