Skip to content

chore: migrate to mediavocab 1.0#18

Open
JarbasAl wants to merge 27 commits into
devfrom
feat/mediavocab-integration
Open

chore: migrate to mediavocab 1.0#18
JarbasAl wants to merge 27 commits into
devfrom
feat/mediavocab-integration

Conversation

@JarbasAl

@JarbasAl JarbasAl commented May 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Bumps mediavocab>=1.0.0 pin
  • Renames the modality axis to playback_type (PlaybackModalityPlaybackType, infer_modalityinfer_playback_type, Signals.modalitySignals.playback_type, provider modality ClassVar → playback_type)
  • Adapts to 1.0 API tightenings: Work.extra / Entity.extra now Dict[str, str]; Entity(ORGANISATION) requires org_kind; Work.country split into production_country / publication_country / broadcaster_country; Release.variant_kind removed (Work-only); VariantKind.{REGIONAL,DELUXE,BOOTLEG,…} moved to ReleasePackaging; Stream.media_typeStream.kind; programme-format genre constants removed — use literal strings or ProgrammeFormat.

Test plan

  • pytest -x -q passes locally against mediavocab 1.0.0

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features

  • Chapter-level metadata support (offset, runtime, stream URLs)
  • Mediavocab integration for typed release conversion
  • Pluggable HTTP transport with curl_cffi support
  • Offline BookIndex for local search and caching
  • Enhanced AudioBook model with genres, narrators, codec, bitrate, and external IDs

Bug Fixes

  • Fixed GoldenAudioBooks sitemap parsing
  • Improved LibriVox API error resilience

Documentation

  • Completely redesigned README
  • Added comprehensive getting-started guide and API documentation

Review Change Stack

JarbasAl and others added 27 commits May 6, 2026 22:11
audiobooker.converters.audiobook_to_release() has been implicit-
importing mediavocab since it was added; this commit makes the
relationship explicit:

- mediavocab>=0.1.0 in [project] dependencies (was unlisted).
- pytest declared as a `test` optional extra so `pip install
  audiobooker[test]` works without surprises.
- CHANGELOG entry.

115 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Populate Release.release_date from AudioBook.year as IsoDate-compatible YYYY.
- Stamp Release.license = "public_domain" for LibriVox content so the
  Release.parsed_license property resolves to an open License.
- README + docs: mention mediavocab as a hard runtime dep, drop stale
  requests-cache references (the PR replaced CachedSession with a plain
  requests.Session), and show a search → typed Release example.
- New examples/mediavocab_release.py demonstrating parsed_license filtering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LibriVox now emits one AudioBook per book (instead of one per section) with
full per-section Chapter offsets/titles, deduplicated reader cast, librivox_id
external id, content_genres, codec/bitrate, audio_language, and total runtime
from the API. LoyalBooks similarly aggregates RSS entries into chapters of a
single book.

The audiobook_to_release converter now maps:
  - AudioBook.chapters         → Release.chapters (offset, end, title)
  - AudioBook.narrators[]      → Work.credits (PERFORMER, deduped)
  - AudioBook.genres           → Work.content_genres
  - AudioBook.external_ids     → Work.external_ids / Release.external_ids
  - AudioBook.codec/bitrate    → Release.codec/bitrate
  - AudioBook.language         → Release.audio_language
  - LoyalBooks                 → license="public_domain"

base.AudioBook gains chapters, narrators, genres, codec, bitrate,
external_ids fields; AudioBookChapter is exported. The example script now
shows off the rich Release output end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ruff F/B sweep: add __all__ exports, drop unused imports, remove F841
- fix GoldenAudioBooks scraper: domain moved to goldenaudiobooks.com
  (sitemap_index.xml); _iter_post_sitemaps was filtering leaf URLs by
  index-only substring and yielded nothing
- harden Librivox _api_get: treat HTTP 500 / decode errors as empty
  result so combined queries (e.g. title=^X & author=Y) don't raise
- Use `as X` re-export aliases on all public imports in __init__.py to
  satisfy ruff F401 (unused-import on intentional re-exports)
- Reconcile narrator/narrators when both supplied: narrators list is
  authoritative, narrator singular always mirrors narrators[0]
- genres=list(genres) in librivox scrapper to avoid sharing the same
  mutable list object between tags and genres
- Remove hardcoded bitrate="128" from LibriVox (serves both 64 and 128 kbps)
- Simplify stable_id() guard in converters (always returns non-empty string)
- Fix stale "cached session" wording in docs/api.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the reference pattern for cassette-backed parser tests:
- test/conftest.py — vcr_config + vcr_cassette_dir fixtures
- test/test_librivox_vcr.py — one test per public Librivox method
- test/cassettes/ — recorded YAML cassettes (offline replay)
- .github/workflows/nightly-live.yml — daily re-record vs live API
- pyproject.toml — vcrpy + pytest-vcr in [test] extra

Replays in 0.7 s offline; nightly job catches upstream drift within 24h.
Pattern to be replicated across remaining scrapers and companion repos.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevents shared mutable reference between tags and genres fields so
mutations to one list do not silently affect the other.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Allow per-instance Session injection on AudioBookSource subclasses,
add audiobooker.transport.default_session() honouring the
AUDIOBOOKER_TRANSPORT=curl_cffi env var, and expose a [stealth] extra
that pulls in curl_cffi for TLS-fingerprint impersonation. Class-level
default session is preserved so existing callers keep working.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s base

Adds 100+ tests covering audiobook_to_release conversion, the Click CLI surface
(search/index/cache subcommands), the parallel search orchestrator with timeout
and dedup, AudioBook/BookAuthor/Narrator equality and narrator sync, utils
HTTP/sitemap/score edge cases, and AudioBookSource base methods.

Raises overall package coverage from 53% to 76%.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…be scraper

Adds offline tests for the cache module's download progress/failure paths,
play with cached and uncached files, _open_file across platforms, the
argparse main(), and the _find_book index/Librivox fallback chain.

Adds tests for the index CLI (build/update/stats/follow/unfollow/list/search),
IndexedSource delegation, _resolve_sources name normalisation, and the
_followed_as_sources YouTube instantiation path.

Adds offline tests for youtube.py covering _length_to_seconds, _parse_name,
_parse_video_item (both old videoRenderer and new lockupViewModel layouts),
_parse_playlist_video, extract_yt_metadata (authors/narrator/year/hashtags/
suffix stripping), _video_to_book metadata-resolution rules, _YtSourceMixin
search methods, iterate_all on channel/playlist sources (min_runtime and
title_blacklist filters), the pre-configured TheCybrarian / HorrorBabble /
TheDustyTome subclasses, and the tutubo-backed _iter_channel_videos /
_iter_playlist_videos parsers with mocked initial_data.

Raises overall package coverage from 76% to 94%.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds offline tests for AudioAnarchy / DarkerProjects / GoldenAudioBooks /
HPTalesAudioBooks / StephenKingAudioBooks / Librivox / LoyalBooks covering
the get_soup-returns-None branches, parse failures (missing h1/content,
no streams ParseErrorException, Harry Potter / Stephen Fry narrator branch),
LoyalBooks calc_runtime invalid input, from_rss with no chapters and with
duplicate authors, search_by_tag genre matching with absolute/relative hrefs,
iterate_popular and iterate_all RSS exception handling, and Librivox
iterate_all on empty API responses.

Raises overall package coverage from 94% to 96%, exceeding the 95% target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…out a subcommand

Previously fell through to args.query lookup which raised AttributeError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… cassettes

Without `install_extras: test`, vcrpy/pytest-vcr are absent from the
coverage environment; pytest.mark.vcr is treated as unknown and tests
make unguarded live HTTP calls that return None, causing all
test_loyalbooks_vcr assertions to fail.

Also tighten narrator/narrators reconciliation in AudioBook.__post_init__
so that when both fields are supplied and diverge the list stays
consistent (narrators list is authoritative, singular is set to [0]).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The reusable coverage workflow installs install_extras as a raw pip
argument; 'test' was being treated as a package name rather than the
project's [test] optional-dependency group.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents indefinite hangs on network stalls, addressing CodeRabbit
feedback on PR #14.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, doc links

AI-Generated Change:
- Model: claude-sonnet-4-6
- Intent: replace wall-of-text README with install/quickstart/tables/links structure
- Impact: rewrote README.md; removed marketing prose; added [stealth]/[youtube]/[test] extras explained; added transport, mediavocab, CLI reference sections; links to /docs and /examples
- Verified via: manual review against source code (transport.py, converters.py, cli.py, search.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s/README.md

AI-Generated Change:
- Model: claude-sonnet-4-6
- Intent: fill documentation gaps after mediavocab integration and transport refactor
- Impact: created docs/converters.md (audiobook_to_release field mapping with citations); created docs/transport.md (default_session, AUDIOBOOKER_TRANSPORT, per-instance injection); created docs/getting-started.md (install, first search, key objects); updated docs/README.md to list all pages and renumbered examples table
- Verified via: manual review against converters.py, transport.py, scrappers/__init__.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AI-Generated Change:
- Model: claude-sonnet-4-6
- Intent: replace scattered ad-hoc examples with a learnable numbered sequence
- Impact: added examples/01_quickstart.py through 10_cli_pipeline.py; each file has a docstring, runs as-is, and prints meaningful output; covers LibriVox search, unified search, author filter, cache download/play, mediavocab conversion, custom session/curl_cffi, YouTube sources, advanced index with IndexedSource, and a shell-callable pipeline
- Verified via: manual review against base.py, cache.py, converters.py, index.py, transport.py, scrappers/youtube.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bump pin: mediavocab>=1.0.0
- rename PlaybackModality → PlaybackType, infer_modality →
  infer_playback_type, Signals.modality / provider ClassVar →
  playback_type
- adapt to 1.0 API tightenings where present: Work.extra /
  Entity.extra now Dict[str, str]; Entity(ORGANISATION) requires
  org_kind; Work country split into production_country /
  publication_country / broadcaster_country; Release.variant_kind
  removed (Work-only); VariantKind.{REGIONAL,DELUXE,BOOTLEG,…} moved
  to ReleasePackaging; Stream.media_type → Stream.kind; programme
  formats (news / talk_show / sports / documentary / concert) no
  longer in genre constants — use literal strings or ProgrammeFormat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 11, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR integrates mediavocab release conversion, refactors HTTP session management from requests-cache to a pluggable transport layer (with optional curl_cffi stealth support), restructures LibriVox and LoyalBooks scrapers to emit one AudioBook per book/feed with chapter aggregation, and expands the AudioBook dataclass with narrators, chapters, genres, and metadata fields. The changes include comprehensive documentation updates, 10 new example scripts, and extensive test coverage including VCR cassettes.

Changes

Core Data Model & Types

Layer / File(s) Summary
AudioBook & AudioBookChapter Dataclasses
audiobooker/base.py
New AudioBookChapter dataclass with offset, runtime, stream, image. AudioBook extended with narrators (plural list), chapters, genres, year, runtime, source, score, codec, bitrate, external_ids. __post_init__ reconciles singular narrator with narrators list.
Package Exports
audiobooker/__init__.py
Reformatted imports to multi-line explicit aliases, preserving same exported public API including new AudioBookChapter.

HTTP Session & Transport Layer

Layer / File(s) Summary
Pluggable Transport Module
audiobooker/transport.py
New default_session() function checks AUDIOBOOKER_TRANSPORT env var. Returns curl_cffi.requests.Session when set and available; falls back to requests.Session with randomized User-Agent. Silently handles ImportError when curl_cffi unavailable.
AudioBookSource Session Refactor
audiobooker/scrappers/__init__.py
Removes requests_cache.CachedSession, replaces with module-level _default_session (requests.Session with randomized User-Agent). Adds __init__(session=None) constructor for per-instance injection.
Utilities & Imports
audiobooker/search.py, audiobooker/utils.py
Removes unused _METHOD_WEIGHTS import. Minor exception binding cleanup in get_html.

Mediavocab Converter

Layer / File(s) Summary
AudioBook to Release Conversion
audiobooker/converters.py
New audiobook_to_release() converts AudioBook to mediavocab.Release. Includes helpers for author/narrator entity refs, chapter mapping with end offset computation. Populates work/release fields: title, media type, year, runtime, language, genres, external IDs, credits, chapters. Derives public-domain license from source name matching.

Scraper Architecture Changes

Layer / File(s) Summary
Session Injection in Scrapers
audiobooker/scrappers/hpaudiotales.py, audiobooker/scrappers/librivox.py, audiobooker/scrappers/loyalbooks.py
HPTalesAudioBook gains optional session field. Scrapers' iterate methods pass session=self.session to constructors. LibriVox and LoyalBooks methods accept session param for injection.
LibriVox Per-Book Architecture
audiobooker/scrappers/librivox.py
Major refactor: emits one AudioBook per book (not per-section). New _api_get handles session injection and errors gracefully. _section_streams fetches RSS URLs. _build_book creates chapters with per-section offset/runtime, aggregates streams and de-duplicates narrators, sets codec=mp3, bitrate="", external_ids. Removes old per-section _parse_res logic.
LoyalBooks RSS Feed-as-Book
audiobooker/scrappers/loyalbooks.py
Refactored from_rss to yield one AudioBook per feed (not per-entry). Entries map to chapters with offset/runtime aggregation. Authors de-duplicated across entries. Total runtime summed from chapters. Sets genres, chapters, codec=mp3, title now just feed title.
Scraper Metadata Updates
audiobooker/scrappers/darkerprojects.py, audiobooker/scrappers/goldenaudiobooks.py, audiobooker/scrappers/hpaudiotales.py, audiobooker/scrappers/youtube.py
DarkerProjects adds genres=["Audio Drama"]. HPTalesAudioBook adds genres=["Fantasy"]. GoldenAudioBooks updates sitemap constants and adds _iter_book_urls filter. YouTube removes unused import.

CLI & Cache

Layer / File(s) Summary
Cache & CLI Command Handling
audiobooker/cache.py, audiobooker/cli.py
cache.main() now checks args.cmd is not None, exits code 2 if no subcommand. cli.py removes unused _default_sources import.

Documentation

Layer / File(s) Summary
Getting Started & API Docs
docs/getting-started.md, docs/api.md
New getting-started.md with installation, first search, key objects, search methods, single-source, mediavocab conversion, CLI examples. api.md clarifies plain requests.Session behavior and adds HTTP transport section with default_session() documentation.
Converters & Transport Documentation
docs/converters.md, docs/transport.md
New converters.md documents field mapping, credit roles, license rules, chapter conversion. New transport.md describes pluggable transport, AUDIOBOOKER_TRANSPORT, session injection, curl_cffi stealth backend.
Sources & Index Documentation Updates
docs/sources.md, docs/README.md
sources.md replaces requests-cache section with plain session guidance. README.md updated doc/example index.
Main README Rewrite
README.md
Rewritten emphasizing multi-source search, unified models, cache, local index, mediavocab integration. Installation covers optional extras. Quick start example, condensed sources table, Python API section, HTTP transport, local index, and CLI reference added.
Changelog & Dependencies
CHANGELOG.md, pyproject.toml, requirements.txt, .github/workflows/
CHANGELOG.md adds 0.8.0 section documenting mediavocab integration and per-book architecture. pyproject.toml adds mediavocab>=1.0.0, test extra (pytest/vcrpy/pytest-vcr), stealth extra (curl-cffi). requirements.txt cleared. coverage.yml passes .[test] extras. New nightly-live.yml workflow for VCR cassette re-recording against live endpoints.

Example Scripts

Layer / File(s) Summary
Basic Examples
examples/01_quickstart.py through examples/05_play_with_cache.py
Quickstart (Librivox title search), unified search, author filtering with language, short book download, playback with cache. All include metadata printing.
Advanced Examples
examples/06_convert_to_mediavocab.py through examples/10_cli_pipeline.py, examples/mediavocab_release.py
Mediavocab conversion with format helpers, custom sessions (requests + curl_cffi), YouTube sources, advanced index with typo tolerance and YouTube follow, CLI pipeline (search→download→print), mediavocab release metadata/credits/chapters.

Test Suite

Layer / File(s) Summary
Test Infrastructure & Fixtures
test/conftest.py, test/cassettes/*/
vcr_config() and vcr_cassette_dir() pytest fixtures. VCR cassette YAMLs for AudioAnarchy, DarkerProjects, GoldenAudioBooks, HPTales, LoyalBooks, StephenKing iterate/search operations.
Converters & Base Model Tests
test/test_converters.py, test/test_base_extra.py
audiobook_to_release field mapping (work/release, license by source, narrator dedup, chapter end, external IDs, genres). normalize_language, author/narrator equality/hash, narrator reconciliation, has_live_streams reachability.
Transport & Utils Tests
test/test_transport.py, test/test_utils_extra.py
Injected sessions, default_session() with/without curl_cffi, AUDIOBOOKER_TRANSPORT env var, backward compatibility. random_user_agent, get_html fallback, get_soup, check_url_availability, sitemap iteration, score_book.
Cache & CLI Tests
test/test_cache_extra.py, test/test_cli.py
Download (skip existing, error handling, cleanup partials), play (open cached, download missing), platform-specific file opening, listing/clearing. CLI: _print_book, help/version, search filters, index commands (stats/build/update/search/follow), cache commands.
Scraper Unit & VCR Tests
test/test_scrappers_base.py, test/test_scrappers_edges.py, test/test_*_vcr.py files
AudioBookSource search methods (matching, dedup, field behavior). Edge-case offline coverage (missing HTML, empty results, error handling). VCR-backed: AudioAnarchy, DarkerProjects, GoldenAudioBooks, HPTales, LibriVox (chapters/narrators/tags), LoyalBooks (chapters), StephenKing.
Search & Index Tests
test/test_search.py, test/test_index_extra.py
Unified search routing, dedup, max_per_source, timeout/exception resilience. Index: iteration/search/filtering, _resolve_sources, YouTube follow, CLI commands, FTS fallback, repr/len/context-manager.
YouTube Scraper Tests
test/test_youtube.py
_length_to_seconds, _parse_name, _parse_video_item, _parse_playlist_video, metadata extraction, _video_to_book, source search methods, channel/playlist iteration, preconfigured sources.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • TigreGotico/audiobooker#14: Both PRs make the same core changes: adding audiobook_to_release converters, expanding AudioBook with chapters/narrators, refactoring AudioBookSource session to requests.Session, and corresponding docs/tests.
  • TigreGotico/audiobooker#12: Both PRs modify cache/download/play utilities and BookIndex duration-filtering (CLI flags/tests/examples), indicating shared changes to audiobooker.cache and index search APIs.
  • TigreGotico/audiobooker#7: Both PRs modify the CI coverage workflow (.github/workflows/coverage.yml), with the earlier PR adding the workflow and this PR updating the install_extras input.

🐰 Chorus of chapters, now one per book,
Sessions plug in, no cache in sight,
Mediavocab's release takes flight,
From many sources to one Release object—
The scrapers refactored, docs retrofit! 📚✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/mediavocab-integration
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch feat/mediavocab-integration

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (10)
test/test_utils_extra.py (1)

25-31: ⚡ Quick win

Consider testing SSL-specific exception types.

The test name suggests SSL-specific fallback, but the mock uses a generic Exception("ssl"). If get_html() is intended to retry only SSL-related errors with verify=False, consider using an actual SSL exception type (e.g., requests.exceptions.SSLError) to ensure the test accurately reflects the intended behavior and doesn't pass if the implementation changes to catch all exceptions indiscriminately.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_utils_extra.py` around lines 25 - 31, The test
test_falls_back_to_verify_false currently raises a generic Exception("ssl")
which doesn't ensure get_html only retries on SSL errors; update the mocked
side_effect to raise the specific requests.exceptions.SSLError (import it in the
test) for the first call on AudioBookSource.session.get and keep the second
response as ok, so the assertion self.assertEqual(get_html("http://x"),
"<html2>") verifies retry behavior only on SSLError and not on all exceptions.
audiobooker/transport.py (1)

35-38: ⚡ Quick win

Consider narrower exception handling or logging for User-Agent setup.

The bare except Exception: pass on line 37 silently suppresses any error when setting the User-Agent header for curl_cffi sessions. While this defensive approach ensures the function always returns a session, it could hide genuine issues like API incompatibilities or attribute errors. Consider either:

  • Narrowing to specific exceptions (e.g., AttributeError, KeyError)
  • Adding minimal logging (e.g., logging.debug) so failures can be diagnosed
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@audiobooker/transport.py` around lines 35 - 38, Replace the broad silent
except with targeted handling: catch only expected errors (e.g., AttributeError,
KeyError) around session.headers.update({"User-Agent": random_user_agent()}) and
emit a minimal debug log on failure (use logging.getLogger(__name__).debug with
context and the exception) so legitimate issues are visible while preserving the
fallback behavior; keep the call to random_user_agent() and the update inside
the try block and do not change the function's return behavior.
audiobooker/scrappers/hpaudiotales.py (1)

11-11: 💤 Low value

Consider more specific type hint for session.

The type hint object is very generic. While it accommodates both requests.Session and curl_cffi.requests.Session (which don't share a common base), consider using Optional[requests.Session] or a Protocol if you want to document the expected interface more clearly. That said, object works fine for duck-typed compatibility.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@audiobooker/scrappers/hpaudiotales.py` at line 11, The module-level variable
session is currently typed as the generic object; change it to a more specific
type to document the expected interface — for example use
Optional[requests.Session] by importing Optional from typing and requests
(replace "session: object = None" with "session: Optional[requests.Session] =
None"), or if you must support both requests.Session and
curl_cffi.requests.Session define a small Protocol (e.g., SessionProtocol with
request/get/post attributes) and type session as Optional[SessionProtocol];
update imports accordingly.
examples/09_advanced_index.py (1)

13-13: 💤 Low value

Consider using tempfile.mkdtemp() for cross-platform compatibility.

The hardcoded /tmp/audiobooker_demo.db path works on Unix-like systems but won't work on Windows. For a more portable example, consider using Python's tempfile module:

import tempfile
DB_PATH = os.path.join(tempfile.gettempdir(), "audiobooker_demo.db")

That said, for a quick demo script this is acceptable as-is.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/09_advanced_index.py` at line 13, The hardcoded DB_PATH constant
uses "/tmp/audiobooker_demo.db" which is not cross-platform; update the DB_PATH
initialization (symbol: DB_PATH) to build the path using Python's tempfile
utilities (e.g., tempfile.gettempdir() or tempfile.mkdtemp()) combined with
os.path.join so the example works on Windows and Unix-like systems and remains a
portable demo.
docs/README.md (1)

9-9: 💤 Low value

Optional: Use official "YouTube" capitalization.

The official brand name is "YouTube" (capital T), not "Youtube".

📝 Proposed fix
-| [youtube.md](youtube.md) | `YoutubeChannelSource`, `YoutubePlaylistSource`, `TheCybrarian`, `HorrorBabble` |
+| [youtube.md](youtube.md) | YouTube sources: `YoutubeChannelSource`, `YoutubePlaylistSource`, `TheCybrarian`, `HorrorBabble` |

Also apply to line 27:

-| `08_youtube_audiobooks.py` | YouTube sources (requires `[youtube]` extra) |
+| `08_youtube_audiobooks.py` | YouTube sources (requires `[youtube]` extra) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/README.md` at line 9, Update the branding to use the official "YouTube"
capitalization: change the link text and any occurrences of the word "Youtube"
in the README to "YouTube" (e.g., the markdown link text [youtube.md] should
become [YouTube.md] or simply display "YouTube" while keeping the file path if
required) and update symbol names shown in the list such as
`YoutubeChannelSource` and `YoutubePlaylistSource` to `YouTubeChannelSource` and
`YouTubePlaylistSource`; also make the same capitalization correction for the
related mention on the other occurrence referenced (the one around the same
block where `TheCybrarian` and `HorrorBabble` are listed).
examples/07_custom_session.py (1)

25-35: ⚡ Quick win

Remove the misleading except ImportError block; detect fallback from the resolved session backend instead.

The default_session() function silently handles curl_cffi import failures internally and always returns a session (either curl_cffi.requests.Session or requests.Session). The except ImportError block in the example will never execute. The example already has the correct fallback detection via type(session_b).__module__—use that exclusively instead of the unreachable exception handler.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/07_custom_session.py` around lines 25 - 35, Remove the unreachable
try/except around default_session(); default_session() always returns a session
object, so delete the ImportError handler and rely on inspecting
type(session_b).__module__ (the existing backend detection) to determine whether
curl_cffi or requests is in use; leave the creation of session_b via
default_session(), the backend variable computation, and the
Librivox(session=session_b) usage intact so the example shows the actual
fallback based on the resolved session backend.
test/test_stephenkingaudiobooks_vcr.py (1)

27-31: 💤 Low value

Consider adding streams assertion for consistency.

The test_iterate_all_yields_typed_audiobooks test (line 23) asserts both title and streams, but this test only checks title. If streams are expected to be present for title search results, adding that assertion would improve test coverage and consistency.

Suggested addition
     assert isinstance(book, AudioBook)
     assert book.title
+    assert book.streams, "expected at least one stream URL"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_stephenkingaudiobooks_vcr.py` around lines 27 - 31, The
test_search_by_title test only asserts book.title but should also assert that
streams are present like the other test; update the test_search_by_title
function to retrieve the first result from
StephenKingAudioBooks().search_by_title("Carrie") (as it already does), then add
an assertion that the returned object is an AudioBook and that book.streams is
truthy (or has expected type/length) to mirror
test_iterate_all_yields_typed_audiobooks and ensure streams are validated for
title searches.
test/test_hpaudiotales_vcr.py (2)

26-30: 💤 Low value

Consider adding assertion message for consistency.

Line 30 asserts book.streams without a message, while line 23 includes the helpful message "expected at least one stream URL". Adding the same message here would improve consistency and test output clarity.

Suggested consistency fix
     assert book.title
-    assert book.streams
+    assert book.streams, "expected at least one stream URL"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_hpaudiotales_vcr.py` around lines 26 - 30, Add a consistent
assertion message to the existing check of book.streams in the
test_iterate_popular_yields_typed_audiobooks function: replace the bare assert
book.streams with an assertion that includes the same helpful message used
earlier (e.g., "expected at least one stream URL") so failures report a clear,
consistent message referencing book.streams.

13-16: ⚡ Quick win

Consider moving _first helper to conftest.py.

This helper function is duplicated across multiple VCR test modules (test_darkerprojects_vcr.py, test_stephenkingaudiobooks_vcr.py). Moving it to test/conftest.py as a shared fixture or utility function would eliminate duplication and improve maintainability.

♻️ Proposed consolidation in conftest.py

Add to test/conftest.py:

def first_or_none(iterable):
    """Return the first item from an iterable, or None if empty."""
    for item in iterable:
        return item
    return None

Then import in test files:

+from conftest import first_or_none

-def _first(it):
-    for x in it:
-        return x
-    return None

 def test_iterate_all_yields_typed_audiobooks():
-    book = _first(HPTalesAudioBooks().iterate_all())
+    book = first_or_none(HPTalesAudioBooks().iterate_all())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_hpaudiotales_vcr.py` around lines 13 - 16, The small helper _first
is duplicated across VCR tests; extract it into a shared test utility (e.g.,
conftest) and rename to a clearer name like first_or_none with the same behavior
(return first item from an iterable or None). Update callers in
test_hpaudiotales_vcr.py and the other VCR test modules to import and use
first_or_none instead of defining their own _first functions to remove
duplication and centralize the utility.
README.md (1)

141-162: 💤 Low value

Consider adding a language identifier to the CLI reference code block.

The fenced code block starting at line 141 lacks a language identifier, which would improve syntax highlighting in rendered markdown.

📝 Suggested fix
-```
+```bash
 audiobooker search <query>

Alternatively, use ```text if you prefer no syntax highlighting.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 141 - 162, The README code block that documents the
CLI (starting with the line containing "audiobooker search <query>") is missing
a fenced-code language tag; update the opening fence from ``` to ```bash (or
```text if you prefer no highlighting) so the CLI examples (e.g., "audiobooker
search <query>", "audiobooker index build", "audiobooker cache download
<query>") are rendered with appropriate syntax highlighting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@audiobooker/converters.py`:
- Line 66: The key construction key = (narrator.first_name.lower(),
narrator.last_name.lower()) can raise AttributeError if narrator.first_name or
narrator.last_name is None; update the code that builds this key (the key
variable in converters.py) to safely coerce None to an empty string (or fallback
string) before calling .lower() — e.g., use ( (narrator.first_name or
"").lower(), (narrator.last_name or "").lower() ) or use getattr(narrator,
"first_name", "")/getattr(narrator, "last_name", "") — ensuring the rest of the
code that uses key continues to expect lowercase strings.
- Around line 28-35: The helper functions _author_ref and _narrator_ref build
names with f-strings that can yield "None" when first_name or last_name is None;
update both functions to construct the display name by joining only non-empty,
non-None parts (e.g., collect [first_name, last_name] and join parts that are
truthy, then .strip()), and keep creating EntityRef(name=<joined_name>,
kind=EntityKind.PERSON); this ensures names omit "None" and handle missing
fields gracefully.

In `@audiobooker/scrappers/librivox.py`:
- Around line 53-66: _section_streams currently lets feedparser.parse fetch the
RSS URL itself, bypassing the injected requests session (sess) and its
proxies/certs/transport; change it to fetch the RSS using sess (e.g., sess.get
with appropriate headers, timeout and error handling), check response.ok (raise
or return empty on bad status), then pass the raw response content/text into
feedparser.parse (instead of the URL) so parsing uses the session's transport
and headers; keep using sess.headers.get("User-Agent") when setting request
headers and preserve the existing request_headers like "Connection": "close".

In `@docs/converters.md`:
- Line 3: Remove hard-coded line numbers in docs/converters.md and replace them
with symbol-based references or links to anchored locations; e.g., instead of
"audiobook_to_release() — audiobooker/converters.py:46" change to "see
audiobook_to_release() in audiobooker/converters.py" or link to an anchor for
the function. Update all occurrences that reference specific lines (the entries
for audiobook_to_release, and the ones at lines 44, 53, 59) to use function
names or stable anchors so the references remain correct as the code changes.

In `@examples/05_play_with_cache.py`:
- Around line 12-20: The example should guard before calling play(book,
stream=0): check the retrieved book's streams (e.g., book.streams or whichever
attribute holds playable URLs) and if it's empty or has no playable URL, print a
friendly message like "No streams available for this book." and skip the
play(...) call; otherwise proceed to print Cached and launch the player with
play(book, stream=0). Ensure you reference the same variables used in the
snippet (Librivox().search_by_title, book, is_cached, play) so the conditional
integrates in place of the unconditional play(...) call.

In `@examples/08_youtube_audiobooks.py`:
- Line 12: Update the typo in the user-facing error string printed in
examples/08_youtube_audiobooks.py: change the print(...) message that currently
says "tutubo not installed. Run: pip install audiobooker[youtube]" to correctly
spell "youtube" so it reads "youtube not installed. Run: pip install
audiobooker[youtube]"; locate and edit the print statement to fix the typo.

In `@pyproject.toml`:
- Line 20: The pyproject.toml dependency constraint for mediavocab is invalid
(currently "mediavocab>=1.0.0"); update the requirement to a satisfiable version
such as "mediavocab>=0.1.1" (or pin to an exact compatible version) by replacing
the existing mediavocab entry so the project can install the package from PyPI.

In `@test/test_cli.py`:
- Around line 116-119: The test mutates MagicMock class metadata by assigning to
src.__class__.__name__, which can leak across tests; instead create a dedicated
dummy class (e.g., class FakeSrc: pass) and set the mock instance's __class__ to
that class (src.__class__ = FakeSrc) or construct the double as a Mock of that
type (use Mock(spec=FakeSrc) or create_autospec) so you don't change MagicMock
global state; update both places where src.__class__.__name__ is set and keep
the patch to audiobooker.index._default_sources returning [src].
- Around line 240-242: TestCacheCmds.setUp creates a temporary directory
(self.tmp) but never removes it; add a corresponding tearDown method on the
TestCacheCmds test class that safely removes the temp directory (use
shutil.rmtree and guard with os.path.exists(self.tmp)) to prevent leaking temp
dirs between test runs; ensure tearDown handles exceptions (ignore if already
removed) and clears self.tmp so repeated runs/parallel tests don't collide.

In `@test/test_index_extra.py`:
- Around line 125-132: The test currently exits the patch.dict context before
calling idx._followed_as_sources(), so the "YouTube module unavailable" path
isn't exercised; move the call to idx._followed_as_sources() (the assertion
self.assertEqual(idx._followed_as_sources(), [])) inside the with
patch.dict(sys.modules, {"audiobooker.scrappers.youtube": None}): block (remove
the stray pass and any unnecessary importlib/mock lines) so the function is
invoked while the youtube module is patched to None, ensuring the
unavailable-module branch is tested.

In `@test/test_scrappers_edges.py`:
- Around line 224-235: The test currently lacks an assertion—update
test_search_by_tag_with_match so it asserts that the RSS parsing path was
invoked for the matched link: patch audiobooker.scrappers.loyalbooks.from_rss as
a mock (instead of return_value=iter([])) and after calling
list(LoyalBooks().search_by_tag("Horror")) assert that from_rss was called once
with the expected absolute book URL (e.g.
"http://www.loyalbooks.com/book/dracula") or otherwise assert the returned
iterable contains the expected book entry; refer to LoyalBooks.search_by_tag,
get_soup and from_rss to locate where to add the mock assertion.

In `@test/test_search.py`:
- Around line 74-79: The test_search_filters_min_score currently uses a fixture
that exactly matches the query so it never exercises the score-filtering path;
update the FakeSource fixture in test_search_filters_min_score to return a title
that will score below the min threshold (e.g., a clearly unrelated title instead
of "Dracula") and call search with an explicit min_score (or rely on the
function's default) so the returned low-score item is filtered out, then assert
that no result has that title; locate the test function name
test_search_filters_min_score, the FakeSource/_b fixture, and the
search/search_by_title call to make the change.

---

Nitpick comments:
In `@audiobooker/scrappers/hpaudiotales.py`:
- Line 11: The module-level variable session is currently typed as the generic
object; change it to a more specific type to document the expected interface —
for example use Optional[requests.Session] by importing Optional from typing and
requests (replace "session: object = None" with "session:
Optional[requests.Session] = None"), or if you must support both
requests.Session and curl_cffi.requests.Session define a small Protocol (e.g.,
SessionProtocol with request/get/post attributes) and type session as
Optional[SessionProtocol]; update imports accordingly.

In `@audiobooker/transport.py`:
- Around line 35-38: Replace the broad silent except with targeted handling:
catch only expected errors (e.g., AttributeError, KeyError) around
session.headers.update({"User-Agent": random_user_agent()}) and emit a minimal
debug log on failure (use logging.getLogger(__name__).debug with context and the
exception) so legitimate issues are visible while preserving the fallback
behavior; keep the call to random_user_agent() and the update inside the try
block and do not change the function's return behavior.

In `@docs/README.md`:
- Line 9: Update the branding to use the official "YouTube" capitalization:
change the link text and any occurrences of the word "Youtube" in the README to
"YouTube" (e.g., the markdown link text [youtube.md] should become [YouTube.md]
or simply display "YouTube" while keeping the file path if required) and update
symbol names shown in the list such as `YoutubeChannelSource` and
`YoutubePlaylistSource` to `YouTubeChannelSource` and `YouTubePlaylistSource`;
also make the same capitalization correction for the related mention on the
other occurrence referenced (the one around the same block where `TheCybrarian`
and `HorrorBabble` are listed).

In `@examples/07_custom_session.py`:
- Around line 25-35: Remove the unreachable try/except around default_session();
default_session() always returns a session object, so delete the ImportError
handler and rely on inspecting type(session_b).__module__ (the existing backend
detection) to determine whether curl_cffi or requests is in use; leave the
creation of session_b via default_session(), the backend variable computation,
and the Librivox(session=session_b) usage intact so the example shows the actual
fallback based on the resolved session backend.

In `@examples/09_advanced_index.py`:
- Line 13: The hardcoded DB_PATH constant uses "/tmp/audiobooker_demo.db" which
is not cross-platform; update the DB_PATH initialization (symbol: DB_PATH) to
build the path using Python's tempfile utilities (e.g., tempfile.gettempdir() or
tempfile.mkdtemp()) combined with os.path.join so the example works on Windows
and Unix-like systems and remains a portable demo.

In `@README.md`:
- Around line 141-162: The README code block that documents the CLI (starting
with the line containing "audiobooker search <query>") is missing a fenced-code
language tag; update the opening fence from ``` to ```bash (or ```text if you
prefer no highlighting) so the CLI examples (e.g., "audiobooker search <query>",
"audiobooker index build", "audiobooker cache download <query>") are rendered
with appropriate syntax highlighting.

In `@test/test_hpaudiotales_vcr.py`:
- Around line 26-30: Add a consistent assertion message to the existing check of
book.streams in the test_iterate_popular_yields_typed_audiobooks function:
replace the bare assert book.streams with an assertion that includes the same
helpful message used earlier (e.g., "expected at least one stream URL") so
failures report a clear, consistent message referencing book.streams.
- Around line 13-16: The small helper _first is duplicated across VCR tests;
extract it into a shared test utility (e.g., conftest) and rename to a clearer
name like first_or_none with the same behavior (return first item from an
iterable or None). Update callers in test_hpaudiotales_vcr.py and the other VCR
test modules to import and use first_or_none instead of defining their own
_first functions to remove duplication and centralize the utility.

In `@test/test_stephenkingaudiobooks_vcr.py`:
- Around line 27-31: The test_search_by_title test only asserts book.title but
should also assert that streams are present like the other test; update the
test_search_by_title function to retrieve the first result from
StephenKingAudioBooks().search_by_title("Carrie") (as it already does), then add
an assertion that the returned object is an AudioBook and that book.streams is
truthy (or has expected type/length) to mirror
test_iterate_all_yields_typed_audiobooks and ensure streams are validated for
title searches.

In `@test/test_utils_extra.py`:
- Around line 25-31: The test test_falls_back_to_verify_false currently raises a
generic Exception("ssl") which doesn't ensure get_html only retries on SSL
errors; update the mocked side_effect to raise the specific
requests.exceptions.SSLError (import it in the test) for the first call on
AudioBookSource.session.get and keep the second response as ok, so the assertion
self.assertEqual(get_html("http://x"), "<html2>") verifies retry behavior only
on SSLError and not on all exceptions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b743c13d-46ef-4c07-982d-b9c60b479068

📥 Commits

Reviewing files that changed from the base of the PR and between 87df839 and e7ce5b1.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (79)
  • .github/workflows/coverage.yml
  • .github/workflows/nightly-live.yml
  • CHANGELOG.md
  • README.md
  • audiobooker/__init__.py
  • audiobooker/base.py
  • audiobooker/cache.py
  • audiobooker/cli.py
  • audiobooker/converters.py
  • audiobooker/scrappers/__init__.py
  • audiobooker/scrappers/darkerprojects.py
  • audiobooker/scrappers/goldenaudiobooks.py
  • audiobooker/scrappers/hpaudiotales.py
  • audiobooker/scrappers/librivox.py
  • audiobooker/scrappers/loyalbooks.py
  • audiobooker/scrappers/youtube.py
  • audiobooker/search.py
  • audiobooker/transport.py
  • audiobooker/utils.py
  • docs/README.md
  • docs/api.md
  • docs/converters.md
  • docs/getting-started.md
  • docs/sources.md
  • docs/transport.md
  • examples/01_quickstart.py
  • examples/02_search_all_sources.py
  • examples/03_filter_by_author.py
  • examples/04_download_and_cache.py
  • examples/05_play_with_cache.py
  • examples/06_convert_to_mediavocab.py
  • examples/07_custom_session.py
  • examples/08_youtube_audiobooks.py
  • examples/09_advanced_index.py
  • examples/10_cli_pipeline.py
  • examples/mediavocab_release.py
  • pyproject.toml
  • requirements.txt
  • test/cassettes/test_audioanarchy_vcr/test_iterate_all_yields_typed_audiobooks.yaml
  • test/cassettes/test_audioanarchy_vcr/test_iterate_popular_yields_typed_audiobooks.yaml
  • test/cassettes/test_darkerprojects_vcr/test_iterate_all_yields_typed_audiobooks.yaml
  • test/cassettes/test_darkerprojects_vcr/test_iterate_popular_yields_typed_audiobooks.yaml
  • test/cassettes/test_goldenaudiobooks_vcr/test_iterate_all_yields_typed_audiobooks.yaml
  • test/cassettes/test_goldenaudiobooks_vcr/test_iterate_popular_yields_typed_audiobooks.yaml
  • test/cassettes/test_hpaudiotales_vcr/test_iterate_all_yields_typed_audiobooks.yaml
  • test/cassettes/test_hpaudiotales_vcr/test_iterate_popular_yields_typed_audiobooks.yaml
  • test/cassettes/test_librivox_vcr/test_iterate_all_yields_typed_audiobooks.yaml
  • test/cassettes/test_librivox_vcr/test_search_by_author.yaml
  • test/cassettes/test_librivox_vcr/test_search_by_narrator.yaml
  • test/cassettes/test_librivox_vcr/test_search_by_tag.yaml
  • test/cassettes/test_librivox_vcr/test_search_by_title.yaml
  • test/cassettes/test_loyalbooks_vcr/test_iterate_all_yields_typed_audiobooks.yaml
  • test/cassettes/test_loyalbooks_vcr/test_iterate_popular_yields_typed_audiobooks.yaml
  • test/cassettes/test_loyalbooks_vcr/test_search_by_author.yaml
  • test/cassettes/test_loyalbooks_vcr/test_search_by_tag.yaml
  • test/cassettes/test_loyalbooks_vcr/test_search_by_title.yaml
  • test/cassettes/test_stephenkingaudiobooks_vcr/test_iterate_all_yields_typed_audiobooks.yaml
  • test/cassettes/test_stephenkingaudiobooks_vcr/test_search.yaml
  • test/cassettes/test_stephenkingaudiobooks_vcr/test_search_by_author.yaml
  • test/cassettes/test_stephenkingaudiobooks_vcr/test_search_by_title.yaml
  • test/conftest.py
  • test/test_audioanarchy_vcr.py
  • test/test_base_extra.py
  • test/test_cache_extra.py
  • test/test_cli.py
  • test/test_converters.py
  • test/test_darkerprojects_vcr.py
  • test/test_goldenaudiobooks_vcr.py
  • test/test_hpaudiotales_vcr.py
  • test/test_index_extra.py
  • test/test_librivox_vcr.py
  • test/test_loyalbooks_vcr.py
  • test/test_scrappers_base.py
  • test/test_scrappers_edges.py
  • test/test_search.py
  • test/test_stephenkingaudiobooks_vcr.py
  • test/test_transport.py
  • test/test_utils_extra.py
  • test/test_youtube.py
💤 Files with no reviewable changes (2)
  • requirements.txt
  • audiobooker/scrappers/youtube.py

Comment thread audiobooker/converters.py
Comment on lines +28 to +35
def _author_ref(author: BookAuthor) -> EntityRef:
name = f"{author.first_name} {author.last_name}".strip()
return EntityRef(name=name, kind=EntityKind.PERSON)


def _narrator_ref(narrator: AudiobookNarrator) -> EntityRef:
name = f"{narrator.first_name} {narrator.last_name}".strip()
return EntityRef(name=name, kind=EntityKind.PERSON)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Handle None values in name fields.

If first_name or last_name is None, the f-string will produce "None Smith" or "John None". Consider filtering out None values or using empty strings as fallback.

🛡️ Proposed fix
 def _author_ref(author: BookAuthor) -> EntityRef:
-    name = f"{author.first_name} {author.last_name}".strip()
+    name = f"{author.first_name or ''} {author.last_name or ''}".strip()
     return EntityRef(name=name, kind=EntityKind.PERSON)


 def _narrator_ref(narrator: AudiobookNarrator) -> EntityRef:
-    name = f"{narrator.first_name} {narrator.last_name}".strip()
+    name = f"{narrator.first_name or ''} {narrator.last_name or ''}".strip()
     return EntityRef(name=name, kind=EntityKind.PERSON)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _author_ref(author: BookAuthor) -> EntityRef:
name = f"{author.first_name} {author.last_name}".strip()
return EntityRef(name=name, kind=EntityKind.PERSON)
def _narrator_ref(narrator: AudiobookNarrator) -> EntityRef:
name = f"{narrator.first_name} {narrator.last_name}".strip()
return EntityRef(name=name, kind=EntityKind.PERSON)
def _author_ref(author: BookAuthor) -> EntityRef:
name = f"{author.first_name or ''} {author.last_name or ''}".strip()
return EntityRef(name=name, kind=EntityKind.PERSON)
def _narrator_ref(narrator: AudiobookNarrator) -> EntityRef:
name = f"{narrator.first_name or ''} {narrator.last_name or ''}".strip()
return EntityRef(name=name, kind=EntityKind.PERSON)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@audiobooker/converters.py` around lines 28 - 35, The helper functions
_author_ref and _narrator_ref build names with f-strings that can yield "None"
when first_name or last_name is None; update both functions to construct the
display name by joining only non-empty, non-None parts (e.g., collect
[first_name, last_name] and join parts that are truthy, then .strip()), and keep
creating EntityRef(name=<joined_name>, kind=EntityKind.PERSON); this ensures
names omit "None" and handle missing fields gracefully.

Comment thread audiobooker/converters.py
for narrator in narrators:
if not narrator:
continue
key = (narrator.first_name.lower(), narrator.last_name.lower())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Potential AttributeError when narrator names are None.

If narrator.first_name or narrator.last_name is None, calling .lower() will raise an AttributeError.

🐛 Proposed fix
-        key = (narrator.first_name.lower(), narrator.last_name.lower())
+        key = (
+            (narrator.first_name or "").lower(),
+            (narrator.last_name or "").lower(),
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
key = (narrator.first_name.lower(), narrator.last_name.lower())
key = (
(narrator.first_name or "").lower(),
(narrator.last_name or "").lower(),
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@audiobooker/converters.py` at line 66, The key construction key =
(narrator.first_name.lower(), narrator.last_name.lower()) can raise
AttributeError if narrator.first_name or narrator.last_name is None; update the
code that builds this key (the key variable in converters.py) to safely coerce
None to an empty string (or fallback string) before calling .lower() — e.g., use
( (narrator.first_name or "").lower(), (narrator.last_name or "").lower() ) or
use getattr(narrator, "first_name", "")/getattr(narrator, "last_name", "") —
ensuring the rest of the code that uses key continues to expect lowercase
strings.

Comment on lines +53 to +66
def _section_streams(rss_url: str, session=None) -> list:
# NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT
# accept a ``requests.Session``. Injected sessions therefore do not
# apply to RSS fetches — only the User-Agent header is forwarded.
sess = session if session is not None else AudioBookSource.session
try:
ua = sess.headers.get("User-Agent")
except Exception:
ua = None
rss = feedparser.parse(
rss_url,
agent=ua,
request_headers={"Connection": "close"},
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

❓ Verification inconclusive

Does Python feedparser.parse accept raw bytes or a response body so an RSS feed can be fetched with requests/curl_cffi first and then parsed locally?


Yes—feedparser.parse accepts the fetched response body content locally, and it can be bytes. - The feedparser.parse first argument (url_file_stream_or_string) can be a URL, a local filename, or a raw string containing feed data, and the implementation docs explicitly note that both byte and text strings are accepted [1]. - The public docs also state feedparser can parse “a raw string containing feed data in any format” [2]. Practical implication for your workflow: 1) Use requests/curl_cffi to fetch the RSS/Atom body. 2) Pass the response content into feedparser.parse. Example: import feedparser import requests resp = requests.get(url, timeout=10) # resp.content is bytes; resp.text is str feed = feedparser.parse(resp.content) # or: feed = feedparser.parse(resp.text) If you already have HTTP response headers you may also pass them via response_headers (so feedparser can interpret encoding / content-location for relative URLs) [1]. One caution: if you use a file-like wrapper, prefer BytesIO for bytes (the docs mention wrapping untrusted strings in StringIO/BytesIO) [1], and there are reports that passing a text StringIO where bytes are expected can break parsing [3].

Citations:


Script executed:

cat -n audiobooker/scrappers/librivox.py | sed -n '45,75p'

Repository: TigreGotico/audiobooker


Repository: TigreGotico/audiobooker
Exit code: 0

stdout:

    45	        )
    46	        if resp.status_code >= 400:
    47	            return {}
    48	        return resp.json()
    49	    except Exception:
    50	        return {}
    51	
    52	
    53	def _section_streams(rss_url: str, session=None) -> list:
    54	    # NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT
    55	    # accept a ``requests.Session``. Injected sessions therefore do not
    56	    # apply to RSS fetches — only the User-Agent header is forwarded.
    57	    sess = session if session is not None else AudioBookSource.session
    58	    try:
    59	        ua = sess.headers.get("User-Agent")
    60	    except Exception:
    61	        ua = None
    62	    rss = feedparser.parse(
    63	        rss_url,
    64	        agent=ua,
    65	        request_headers={"Connection": "close"},
    66	    )
    67	    return [
    68	        e["media_content"][0]["url"]
    69	        for e in rss.get("entries", [])
    70	        if e.get("media_content")
    71	    ]
    72	
    73	
    74	def _build_book(k: dict, session=None) -> AudioBook:
    75	    """Convert one LibriVox API record into a single typed ``AudioBook``."""


Script executed:

rg -n "_section_streams" audiobooker/scrappers/librivox.py -A 2 -B 2

Repository: TigreGotico/audiobooker


Repository: TigreGotico/audiobooker
Exit code: 0

stdout:

51-
52-
53:def _section_streams(rss_url: str, session=None) -> list:
54-    # NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT
55-    # accept a ``requests.Session``. Injected sessions therefore do not
--
74-def _build_book(k: dict, session=None) -> AudioBook:
75-    """Convert one LibriVox API record into a single typed ``AudioBook``."""
76:    rss_streams = _section_streams(k["url_rss"], session=session)
77-
78-    chapters: list = []


Use the injected session for RSS fetches.

API calls honor session, but _section_streams() still lets feedparser make its own network request. This bypasses proxies, cookies, custom TLS, and the new curl_cffi transport. The web_search confirmed feedparser.parse accepts raw bytes/text content, so you can fetch the RSS with the injected session first and then parse locally.

Suggested fix
 def _section_streams(rss_url: str, session=None) -> list:
-    # NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT
-    # accept a ``requests.Session``. Injected sessions therefore do not
-    # apply to RSS fetches — only the User-Agent header is forwarded.
     sess = session if session is not None else AudioBookSource.session
     try:
-        ua = sess.headers.get("User-Agent")
+        resp = sess.get(rss_url, timeout=30)
+        resp.raise_for_status()
     except Exception:
-        ua = None
-    rss = feedparser.parse(
-        rss_url,
-        agent=ua,
-        request_headers={"Connection": "close"},
-    )
+        return []
+    rss = feedparser.parse(resp.content)
     return [
         e["media_content"][0]["url"]
         for e in rss.get("entries", [])
         if e.get("media_content")
     ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _section_streams(rss_url: str, session=None) -> list:
# NOTE: ``feedparser.parse`` uses ``urllib`` internally and does NOT
# accept a ``requests.Session``. Injected sessions therefore do not
# apply to RSS fetches — only the User-Agent header is forwarded.
sess = session if session is not None else AudioBookSource.session
try:
ua = sess.headers.get("User-Agent")
except Exception:
ua = None
rss = feedparser.parse(
rss_url,
agent=ua,
request_headers={"Connection": "close"},
)
def _section_streams(rss_url: str, session=None) -> list:
sess = session if session is not None else AudioBookSource.session
try:
resp = sess.get(rss_url, timeout=30)
resp.raise_for_status()
except Exception:
return []
rss = feedparser.parse(resp.content)
return [
e["media_content"][0]["url"]
for e in rss.get("entries", [])
if e.get("media_content")
]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@audiobooker/scrappers/librivox.py` around lines 53 - 66, _section_streams
currently lets feedparser.parse fetch the RSS URL itself, bypassing the injected
requests session (sess) and its proxies/certs/transport; change it to fetch the
RSS using sess (e.g., sess.get with appropriate headers, timeout and error
handling), check response.ok (raise or return empty on bad status), then pass
the raw response content/text into feedparser.parse (instead of the URL) so
parsing uses the session's transport and headers; keep using
sess.headers.get("User-Agent") when setting request headers and preserve the
existing request_headers like "Connection": "close".

Comment thread docs/converters.md
@@ -0,0 +1,76 @@
# mediavocab Converters

`audiobook_to_release()` — `audiobooker/converters.py:46`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hard-coded line numbers will become stale.

Lines 3, 44, 53, and 59 reference specific line numbers in audiobooker/converters.py. As the code evolves, these line numbers will drift and become inaccurate, potentially confusing readers.

Consider using function/symbol references instead (e.g., "see audiobook_to_release() in audiobooker/converters.py") or linking to anchored documentation that auto-updates.

Suggested approach
-`audiobook_to_release()` — `audiobooker/converters.py:46`
+`audiobook_to_release()` — see `audiobooker/converters.py`

-## Credits — `audiobooker/converters.py:55`
+## Credits

-## License detection — `audiobooker/converters.py:96`
+## License detection

-## Chapter conversion — `audiobooker/converters.py:31`
+## Chapter conversion
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/converters.md` at line 3, Remove hard-coded line numbers in
docs/converters.md and replace them with symbol-based references or links to
anchored locations; e.g., instead of "audiobook_to_release() —
audiobooker/converters.py:46" change to "see audiobook_to_release() in
audiobooker/converters.py" or link to an anchor for the function. Update all
occurrences that reference specific lines (the entries for audiobook_to_release,
and the ones at lines 44, 53, 59) to use function names or stable anchors so the
references remain correct as the code changes.

Comment on lines +12 to +20
book = next(Librivox().search_by_title("The Yellow Wallpaper"), None)
if book is None:
print("Book not found.")
else:
print(f"Title: {book.title!r}")
print(f"Cached: {is_cached(book)}")
print("Playing stream 0 (downloads if needed)...")
play(book, stream=0)
print("Player launched.")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard the example against empty streams.

This calls play(book, stream=0) unconditionally. If the lookup returns metadata without a playable URL, the example raises instead of printing a friendly message.

Suggested fix
 book = next(Librivox().search_by_title("The Yellow Wallpaper"), None)
 if book is None:
     print("Book not found.")
+elif not book.streams:
+    print(f"Title:   {book.title!r}")
+    print("No playable streams were found.")
 else:
     print(f"Title:   {book.title!r}")
     print(f"Cached:  {is_cached(book)}")
     print("Playing stream 0 (downloads if needed)...")
     play(book, stream=0)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
book = next(Librivox().search_by_title("The Yellow Wallpaper"), None)
if book is None:
print("Book not found.")
else:
print(f"Title: {book.title!r}")
print(f"Cached: {is_cached(book)}")
print("Playing stream 0 (downloads if needed)...")
play(book, stream=0)
print("Player launched.")
book = next(Librivox().search_by_title("The Yellow Wallpaper"), None)
if book is None:
print("Book not found.")
elif not book.streams:
print(f"Title: {book.title!r}")
print("No playable streams were found.")
else:
print(f"Title: {book.title!r}")
print(f"Cached: {is_cached(book)}")
print("Playing stream 0 (downloads if needed)...")
play(book, stream=0)
print("Player launched.")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/05_play_with_cache.py` around lines 12 - 20, The example should
guard before calling play(book, stream=0): check the retrieved book's streams
(e.g., book.streams or whichever attribute holds playable URLs) and if it's
empty or has no playable URL, print a friendly message like "No streams
available for this book." and skip the play(...) call; otherwise proceed to
print Cached and launch the player with play(book, stream=0). Ensure you
reference the same variables used in the snippet (Librivox().search_by_title,
book, is_cached, play) so the conditional integrates in place of the
unconditional play(...) call.

Comment thread test/test_cli.py
Comment on lines +116 to +119
src = MagicMock()
src.iterate_all.return_value = iter([_book("Built")])
src.__class__.__name__ = "FakeSrc"
with patch("audiobooker.index._default_sources", return_value=[src]):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid mutating MagicMock class metadata globally

At Line 118 and Line 127, src.__class__.__name__ = "FakeSrc" mutates class-level state on MagicMock, which can leak into unrelated tests and make failures order-dependent.

Suggested safe test-double pattern
-        src = MagicMock()
-        src.iterate_all.return_value = iter([_book("Built")])
-        src.__class__.__name__ = "FakeSrc"
+        class FakeSrc:
+            pass
+        src = FakeSrc()
+        src.iterate_all = MagicMock(return_value=iter([_book("Built")]))
-        src = MagicMock()
-        src.iterate_all.return_value = iter([_book("Up")])
-        src.__class__.__name__ = "FakeSrc"
+        class FakeSrc:
+            pass
+        src = FakeSrc()
+        src.iterate_all = MagicMock(return_value=iter([_book("Up")]))

Also applies to: 125-129

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_cli.py` around lines 116 - 119, The test mutates MagicMock class
metadata by assigning to src.__class__.__name__, which can leak across tests;
instead create a dedicated dummy class (e.g., class FakeSrc: pass) and set the
mock instance's __class__ to that class (src.__class__ = FakeSrc) or construct
the double as a Mock of that type (use Mock(spec=FakeSrc) or create_autospec) so
you don't change MagicMock global state; update both places where
src.__class__.__name__ is set and keep the patch to
audiobooker.index._default_sources returning [src].

Comment thread test/test_cli.py
Comment on lines +240 to +242
def setUp(self):
self.tmp = tempfile.mkdtemp()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clean up temporary cache dirs created in tests

TestCacheCmds.setUp creates a temp directory at Line 241, but the class has no teardown cleanup. This can leak files across runs and slow CI workers over time.

Minimal cleanup addition
 class TestCacheCmds(unittest.TestCase):
     def setUp(self):
         self.tmp = tempfile.mkdtemp()
+
+    def tearDown(self):
+        import shutil
+        shutil.rmtree(self.tmp, ignore_errors=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def setUp(self):
self.tmp = tempfile.mkdtemp()
def setUp(self):
self.tmp = tempfile.mkdtemp()
def tearDown(self):
import shutil
shutil.rmtree(self.tmp, ignore_errors=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_cli.py` around lines 240 - 242, TestCacheCmds.setUp creates a
temporary directory (self.tmp) but never removes it; add a corresponding
tearDown method on the TestCacheCmds test class that safely removes the temp
directory (use shutil.rmtree and guard with os.path.exists(self.tmp)) to prevent
leaking temp dirs between test runs; ensure tearDown handles exceptions (ignore
if already removed) and clears self.tmp so repeated runs/parallel tests don't
collide.

Comment thread test/test_index_extra.py
Comment on lines +125 to +132
with patch.dict(sys.modules,
{"audiobooker.scrappers.youtube": None}):
# Force the import to fail
import importlib
# Easier: directly mock the function
pass
# Empty list either way
self.assertEqual(idx._followed_as_sources(), [])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Move the assertion inside the patched import scope.

idx._followed_as_sources() runs after patch.dict(...) exits, so this test does not actually cover the “YouTube module unavailable” path.

Suggested fix
-            with patch.dict(sys.modules,
-                            {"audiobooker.scrappers.youtube": None}):
-                # Force the import to fail
-                import importlib
-                # Easier: directly mock the function
-                pass
-            # Empty list either way
-            self.assertEqual(idx._followed_as_sources(), [])
+            with patch.dict(sys.modules,
+                            {"audiobooker.scrappers.youtube": None}):
+                self.assertEqual(idx._followed_as_sources(), [])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with patch.dict(sys.modules,
{"audiobooker.scrappers.youtube": None}):
# Force the import to fail
import importlib
# Easier: directly mock the function
pass
# Empty list either way
self.assertEqual(idx._followed_as_sources(), [])
with patch.dict(sys.modules,
{"audiobooker.scrappers.youtube": None}):
self.assertEqual(idx._followed_as_sources(), [])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_index_extra.py` around lines 125 - 132, The test currently exits
the patch.dict context before calling idx._followed_as_sources(), so the
"YouTube module unavailable" path isn't exercised; move the call to
idx._followed_as_sources() (the assertion
self.assertEqual(idx._followed_as_sources(), [])) inside the with
patch.dict(sys.modules, {"audiobooker.scrappers.youtube": None}): block (remove
the stray pass and any unnecessary importlib/mock lines) so the function is
invoked while the youtube module is patched to None, ensuring the
unavailable-module branch is tested.

Comment on lines +224 to +235
def test_search_by_tag_with_match(self):
# 'Horror' matches 'Horror_and_Supernatural_Fiction'
with patch("audiobooker.scrappers.loyalbooks.get_soup",
return_value=_soup(
'<a href="/book/dracula"></a>'
'<a href="http://www.loyalbooks.com/book/dracula"></a>'
'<a href="/other"></a>'
)), \
patch("audiobooker.scrappers.loyalbooks.from_rss",
return_value=iter([])):
list(LoyalBooks().search_by_tag("Horror"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

test_search_by_tag_with_match currently has no verifiable assertion.

Right now it passes as long as no exception is raised, so it won’t catch regressions in the “matched tag → RSS parsing invoked” path.

Suggested test tightening
 def test_search_by_tag_with_match(self):
     # 'Horror' matches 'Horror_and_Supernatural_Fiction'
-    with patch("audiobooker.scrappers.loyalbooks.get_soup",
-               return_value=_soup(
-                   '<a href="/book/dracula"></a>'
-                   '<a href="http://www.loyalbooks.com/book/dracula"></a>'
-                   '<a href="/other"></a>'
-               )), \
-         patch("audiobooker.scrappers.loyalbooks.from_rss",
-               return_value=iter([])):
-        list(LoyalBooks().search_by_tag("Horror"))
+    with patch("audiobooker.scrappers.loyalbooks.get_soup",
+               return_value=_soup(
+                   '<a href="/book/dracula"></a>'
+                   '<a href="http://www.loyalbooks.com/book/dracula"></a>'
+                   '<a href="/other"></a>'
+               )), \
+         patch("audiobooker.scrappers.loyalbooks.from_rss",
+               return_value=iter([])) as from_rss_mock:
+        list(LoyalBooks().search_by_tag("Horror"))
+    self.assertGreaterEqual(from_rss_mock.call_count, 1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_search_by_tag_with_match(self):
# 'Horror' matches 'Horror_and_Supernatural_Fiction'
with patch("audiobooker.scrappers.loyalbooks.get_soup",
return_value=_soup(
'<a href="/book/dracula"></a>'
'<a href="http://www.loyalbooks.com/book/dracula"></a>'
'<a href="/other"></a>'
)), \
patch("audiobooker.scrappers.loyalbooks.from_rss",
return_value=iter([])):
list(LoyalBooks().search_by_tag("Horror"))
def test_search_by_tag_with_match(self):
# 'Horror' matches 'Horror_and_Supernatural_Fiction'
with patch("audiobooker.scrappers.loyalbooks.get_soup",
return_value=_soup(
'<a href="/book/dracula"></a>'
'<a href="http://www.loyalbooks.com/book/dracula"></a>'
'<a href="/other"></a>'
)), \
patch("audiobooker.scrappers.loyalbooks.from_rss",
return_value=iter([])) as from_rss_mock:
list(LoyalBooks().search_by_tag("Horror"))
self.assertGreaterEqual(from_rss_mock.call_count, 1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_scrappers_edges.py` around lines 224 - 235, The test currently
lacks an assertion—update test_search_by_tag_with_match so it asserts that the
RSS parsing path was invoked for the matched link: patch
audiobooker.scrappers.loyalbooks.from_rss as a mock (instead of
return_value=iter([])) and after calling
list(LoyalBooks().search_by_tag("Horror")) assert that from_rss was called once
with the expected absolute book URL (e.g.
"http://www.loyalbooks.com/book/dracula") or otherwise assert the returned
iterable contains the expected book entry; refer to LoyalBooks.search_by_tag,
get_soup and from_rss to locate where to add the mock assertion.

Comment thread test/test_search.py
Comment on lines +74 to +79
def test_search_filters_min_score(self):
# Yield title that's nothing like the query
src = FakeSource([_b("Dracula")])
# Request through `search` will go to `search_by_title` (FakeSource.search)
results = list(search("Dracula", sources=[src]))
self.assertTrue(any(r.title == "Dracula" for r in results))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

test_search_filters_min_score never hits the filter path.

The fixture on Line 76 is an exact title match for the query on Line 78, so this still passes if score filtering is removed. Please make the returned title intentionally fall below the threshold and assert it is excluded.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_search.py` around lines 74 - 79, The test_search_filters_min_score
currently uses a fixture that exactly matches the query so it never exercises
the score-filtering path; update the FakeSource fixture in
test_search_filters_min_score to return a title that will score below the min
threshold (e.g., a clearly unrelated title instead of "Dracula") and call search
with an explicit min_score (or rely on the function's default) so the returned
low-score item is filtered out, then assert that no result has that title;
locate the test function name test_search_filters_min_score, the FakeSource/_b
fixture, and the search/search_by_title call to make the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant