Skip to content

Release 4.2.0a1#36

Open
github-actions[bot] wants to merge 9 commits into
masterfrom
release-4.2.0a1
Open

Release 4.2.0a1#36
github-actions[bot] wants to merge 9 commits into
masterfrom
release-4.2.0a1

Conversation

@github-actions

@github-actions github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

Human review requested!

JarbasAl and others added 9 commits April 29, 2026 12:58
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: lift parser, locale, content-type into mediavocab

Move title parsing, content-type classification, and the multilingual
.voc keyword tree into mediavocab. tutubo now imports them; CutKind is
gone (replaced by mediavocab.VariantKind, returned directly by
parse_title).

- tutubo.title_parser, tutubo.content_type, tutubo._locale removed
  (lifted to mediavocab.text.title_parse / classify / mediavocab.locale).
- tutubo.__init__ re-exports parse_title, classify_video, ContentType,
  TitleParseResult from mediavocab so existing surface stays.
- VideoPreview.to_work() / to_release() (new): emit mediavocab Work +
  Release directly, bridging through mediavocab_bridge.
- Locale system is stateless — no set_lang/get_lang globals; pass
  lang= per call. Default from MEDIAVOCAB_LANG env at import.
- Locale .voc tree lives at mediavocab/locale/<lang>/.

Docs updated to reflect:
- locale.md rewritten for the stateless `lang=` parameter contract.
- content_type.md + README.md drop the set_lang/TUTUBO_LANG examples.
- Examples that imported CutKind use VariantKind from mediavocab.

562 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: declare mediavocab as hard runtime dep and resync docs

- Add mediavocab>=0.1.0 to pyproject + requirements.txt (CI was failing
  because mediavocab was not installed in the build matrix).
- Drop the optional-import guard in tutubo.mediavocab_bridge — mediavocab
  is required at import time, no shim, no _require_mediavocab() gate.
- Rewrite stale `from tutubo.content_type import ...` examples in README
  and docs/ to the canonical mediavocab imports
  (`mediavocab.taxonomy.ContentType`, `mediavocab.text.{parse_title,
  classify_video, extract_tags}`). The names remain re-exported from the
  tutubo top-level for convenience.
- README: call out mediavocab as a hard runtime dep and document the
  stateless `mediavocab.locale` API (no set_lang / TUTUBO_LANG).
- docs/content_type.md: point at mediavocab as the canonical home and
  note that TRAILER / BEHIND_THE_SCENES / REACTION route to
  MediaType.GENERIC (not MOVIE), matching ContentType.to_routing().
- docs/locale.md: drop "lifted from tutubo._locale" framing; locale is
  mediavocab's now.
- examples/signals_routing.py: new compact end-to-end example showing
  parse_title -> classify_video -> ContentType.to_routing() ->
  Signals(modality=...) for resolver routing.

562 tests pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(deps): install mediavocab from git (not yet on PyPI)

mediavocab does not have a PyPI release yet, so a `>=0.1.0` requirement
fails CI with "No matching distribution". Pin to the git source on the
TigreGotico fork's `dev` branch until a tagged release is published.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: enrich mediavocab Release with badge-derived resolution + delegate routing to ContentType.to_routing

* Release.resolution now lifted from YouTube quality badges
  (4K→2160p, 8K→4320p, HD→1080p) on VideoPreview.to_release()
* Drop the duplicated ContentType→(MediaType, content_genres) table:
  delegate to mediavocab.ContentType.to_routing(); StreamMode stays
  tutubo-local since broadcast continuity is not part of routing
* Document the single deliberate divergence (LIVE_NEWS → GENERIC+news,
  not TV) — mediavocab's TV implies an EPG schema YouTube uploads lack
* Pin mediavocab>=0.1.0 (drop git+ pin ahead of PyPI publish)
* Add examples/rich_release.py demonstrating typed Release output
* Add 7 regression tests covering resolution, captions, and routing parity

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: CHANGELOG for the mediavocab integration

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: ruff sweep — drop unused imports, fix B904 raise-from, add __all__

Removes stale imports left over from the mediavocab integration:
- VariantKind, TitleParseResult, Video, VideoPreview, PlaylistPreview,
  inline ContentType-as-CT in mediavocab_bridge.py (none referenced)
- urlencode in channel.py, Optional/Tuple in _utils.py
- Optional in models.py, Union/YoutubePreview in search.py
Adds 'raise ... from exc' in _utils._parse_object so HTML-parse errors
keep their causal chain. Promotes the package re-exports in
tutubo/__init__.py to an explicit __all__ tuple so ruff stops flagging
intentional surface.

* fix(examples): repair stale attribute and class references

Bugs caught while running every example end-to-end:

- ch_playlists, iptv: imported Channel from tutubo.models (which only
  reaches it via a transitive star-import); switch to tutubo.channel.
  ch_playlists also called the non-existent c.video_urls (Channel exposes
  videos_url + per-Playlist video_urls).
- comedy_clips: hits.sort(reverse=True) crashed because tuples fell back
  to comparing Video objects on view-count ties; sort by views explicitly.
- documentary_deep_dive, kids_safe_feed, live_news_dashboard, livestreams:
  Video (channel-page) has no .length — only VideoPreview (search) does.
  Print published_time / channel-card metadata instead.
- documentary_deep_dive: PodcastPreview has no content_type; show
  episode_count.
- music_albums: iterate_albums lives on YoutubeMusicSearch, not on
  YoutubeSearch. Use the music search facade.
- mus: MusicArtist isn't a MusicPlaylist subclass, so the playlist
  branch never matched and the video branch crashed on v.length /
  v.artist; widen the isinstance and use v.name.
- search: dropped 'from tutubo.models import *' (which silently relied
  on transitive Channel/Playlist re-exports) and the reference to
  RelatedVideo (no such class — only RelatedVideoPreview).
- signals_routing: drop unused PlaybackModality import.

* chore: gitignore example M3U8 playlist artifacts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pin mediavocab>=0.1.1 (first published PyPI release) and add LICENSE

The previous constraint mediavocab>=0.1.0 failed CI because 0.1.0 was
never published to PyPI; 0.1.1 is the first public release. Also adds the
missing Apache-2.0 LICENSE file flagged by repo-health check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: drop legacy requirements.txt — pyproject.toml is authoritative

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: named regression guards for 2026 parser/API-shape bugs

Centralises tripwires for the historical bugs uncovered this year so a
future refactor (or upstream YT drift) re-trips them with explicit
names: Channel star-import, Channel.video_urls removal, channel-tab
Video.length omission, channel-page Video.published_time depopulation,
MusicArtist YTMusicResult subclass invariant, and the
YoutubeSearch / YoutubeMusicSearch surface separation
(iterate_albums / iterate_artists / iterate_tracks live only on the
latter).

* ci: nightly-live workflow re-records fixtures against live YouTube

Mirrors the audiobooker drift-detection pattern: every night the
workflow re-runs record_fixtures.py against live youtube.com /
ytmusic, then replays the offline parser suite against the freshly
captured JSON. A failure means upstream drifted and the committed
fixtures (and possibly the parser) need attention — the canonical YT
silent-failure mode where consumers see empty results in production.
Captured fixtures are NOT committed back; the job is detect-only.

Also adds vcrpy + pytest-vcr to the [test] extra so future
HTTP-cassette tests (e.g. for channel HTML scraping where vcrpy
intercepts requests cleanly) can be added incrementally without
reshuffling deps.

* feat: pluggable HTTP transport with optional curl_cffi stealth extra

Channel and Playlist now route all requests through an injectable
session. tutubo.transport.default_session() returns a curl_cffi session
when TUTUBO_TRANSPORT=curl_cffi (and curl_cffi is installed via the new
[stealth] extra), else a stdlib requests.Session. _innertube._post still
uses urllib.request directly; documented as out-of-scope.

* test: comprehensive coverage suite — 68% → 97%

Adds 235 targeted unit tests filling gaps in:

- mediavocab_bridge (49% → 100%): every Video/Music/Channel/Podcast
  → Work/Release/Entity converter, including upcoming/live/radio
  branches, label/no-label, dedup, and override paths
- ytmus (45% → 98%): MusicTrack/Album/Playlist/Artist parsing edge
  cases (empty fields, fallback chains), get_album dual-path,
  search_yt_music type dispatch + error swallow, _get_ytmus retry
- download (16% → 100%): every subprocess wrapper branch with
  mocked yt-dlp (audio-only, quality, filename, merging-format,
  fallback-newest-file, error, playlist)
- _innertube (53% → 100%): HTTP/URL error paths and fixture
  recording for both query and continuation
- _utils (73% → 94%): URL helpers, HTML extraction failures,
  DeferredGeneratorList edge cases
- channel (67% → 94%): metadata properties, live-stream detection,
  visitor_data extraction, lockup/videoRenderer item parsing,
  Playlist HTML/data/title/video_urls + continuation_post
- search (82% → 98%): synthetic fixture covering every renderer
  type (shelf/radio/playlist/channel/refinement/ad/messageRenderer),
  YoutubeMusicSearch error swallow paths, all iterate_* and for_*
  shortcuts

Total: 1727 stmts, 56 missed → 97% line coverage.

All tests offline (no network, no yt-dlp binary).

* fix: resolve all ruff lint errors (unused imports, E402)

Remove unused imports across test files and examples, fix E402
module-level import ordering in examples/search.py, examples/fanedits.py,
test/record_fixtures.py, and test/test_search.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: fix stale citations, wrong property names, add transport + mediavocab docs

AI-Generated Change:
- Model: claude-sonnet-4-6
- Intent: keep docs accurate after mediavocab migration and channel API changes
- Impact:
  - docs/channel.md: fixed Channel.live vs Channel.streams property names;
    removed references to non-existent channel.current_live; corrected
    url shortcuts (live_url -> streams_url)
  - docs/search.md: removed non-existent YoutubeSearch.iterate_youtube_music()
    and related music methods; moved YoutubeMusicSearch to its own section
    with correct class/method names; pruned stale SearchType values
  - docs/content_type.md: replaced tutubo/content_type.py citations with
    mediavocab.taxonomy / mediavocab.text (module no longer in tutubo)
  - docs/models.md: same citation fix for ContentType/classify_video
  - docs/index.md: corrected class table citations, added YoutubeMusicSearch row,
    expanded Contents with mediavocab.md and transport.md links
  - docs/mediavocab.md: new — Work/Release/Entity bridge, field mapping tables,
    StreamMode overrides, ContentType->MediaType routing
  - docs/transport.md: new — pluggable session, curl_cffi extra, env var,
    scope (search path excluded)
- Verified via: manual review against source code (tutubo/channel.py,
  tutubo/transport.py, tutubo/search.py, tutubo/mediavocab_bridge.py)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* examples: numbered zero-to-hero progression (01–11)

AI-Generated Change:
- Model: claude-sonnet-4-6
- Intent: clean numbered example progression replacing unstructured set
- Impact:
  - 01_quickstart.py: search, result types, content_type, dict interface
  - 02_search_factories.py: 24 factory methods + typed iterators
  - 03_channel.py: metadata, videos, Channel.live vs Channel.streams (correct names)
  - 04_playlist.py: channel playlists and direct Playlist construction
  - 05_podcasts.py: podcast shows, episode listing, is_podcast=True classification
  - 06_music_search.py: YoutubeMusicSearch tracks/artists/playlists
  - 07_music_album.py: MusicAlbum with full track listing
  - 08_fanedits.py: fan-edit detection via parse_title + VariantKind.FANEDIT
  - 09_to_mediavocab.py: all mediavocab fields from to_work()/to_release()
  - 10_custom_session.py: transport modes, curl_cffi injection, Accept-Language
  - 11_pipeline.py: full parse_title → classify → to_routing() → Signals pipeline
- Verified via: ast.parse on all 11 files

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant