Release 4.2.0a1#36
Open
github-actions[bot] wants to merge 9 commits into
Open
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…BrowseId) in search results (#31)
* feat: lift parser, locale, content-type into mediavocab
Move title parsing, content-type classification, and the multilingual
.voc keyword tree into mediavocab. tutubo now imports them; CutKind is
gone (replaced by mediavocab.VariantKind, returned directly by
parse_title).
- tutubo.title_parser, tutubo.content_type, tutubo._locale removed
(lifted to mediavocab.text.title_parse / classify / mediavocab.locale).
- tutubo.__init__ re-exports parse_title, classify_video, ContentType,
TitleParseResult from mediavocab so existing surface stays.
- VideoPreview.to_work() / to_release() (new): emit mediavocab Work +
Release directly, bridging through mediavocab_bridge.
- Locale system is stateless — no set_lang/get_lang globals; pass
lang= per call. Default from MEDIAVOCAB_LANG env at import.
- Locale .voc tree lives at mediavocab/locale/<lang>/.
Docs updated to reflect:
- locale.md rewritten for the stateless `lang=` parameter contract.
- content_type.md + README.md drop the set_lang/TUTUBO_LANG examples.
- Examples that imported CutKind use VariantKind from mediavocab.
562 tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: declare mediavocab as hard runtime dep and resync docs
- Add mediavocab>=0.1.0 to pyproject + requirements.txt (CI was failing
because mediavocab was not installed in the build matrix).
- Drop the optional-import guard in tutubo.mediavocab_bridge — mediavocab
is required at import time, no shim, no _require_mediavocab() gate.
- Rewrite stale `from tutubo.content_type import ...` examples in README
and docs/ to the canonical mediavocab imports
(`mediavocab.taxonomy.ContentType`, `mediavocab.text.{parse_title,
classify_video, extract_tags}`). The names remain re-exported from the
tutubo top-level for convenience.
- README: call out mediavocab as a hard runtime dep and document the
stateless `mediavocab.locale` API (no set_lang / TUTUBO_LANG).
- docs/content_type.md: point at mediavocab as the canonical home and
note that TRAILER / BEHIND_THE_SCENES / REACTION route to
MediaType.GENERIC (not MOVIE), matching ContentType.to_routing().
- docs/locale.md: drop "lifted from tutubo._locale" framing; locale is
mediavocab's now.
- examples/signals_routing.py: new compact end-to-end example showing
parse_title -> classify_video -> ContentType.to_routing() ->
Signals(modality=...) for resolver routing.
562 tests pass locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(deps): install mediavocab from git (not yet on PyPI)
mediavocab does not have a PyPI release yet, so a `>=0.1.0` requirement
fails CI with "No matching distribution". Pin to the git source on the
TigreGotico fork's `dev` branch until a tagged release is published.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: enrich mediavocab Release with badge-derived resolution + delegate routing to ContentType.to_routing
* Release.resolution now lifted from YouTube quality badges
(4K→2160p, 8K→4320p, HD→1080p) on VideoPreview.to_release()
* Drop the duplicated ContentType→(MediaType, content_genres) table:
delegate to mediavocab.ContentType.to_routing(); StreamMode stays
tutubo-local since broadcast continuity is not part of routing
* Document the single deliberate divergence (LIVE_NEWS → GENERIC+news,
not TV) — mediavocab's TV implies an EPG schema YouTube uploads lack
* Pin mediavocab>=0.1.0 (drop git+ pin ahead of PyPI publish)
* Add examples/rich_release.py demonstrating typed Release output
* Add 7 regression tests covering resolution, captions, and routing parity
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: CHANGELOG for the mediavocab integration
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: ruff sweep — drop unused imports, fix B904 raise-from, add __all__
Removes stale imports left over from the mediavocab integration:
- VariantKind, TitleParseResult, Video, VideoPreview, PlaylistPreview,
inline ContentType-as-CT in mediavocab_bridge.py (none referenced)
- urlencode in channel.py, Optional/Tuple in _utils.py
- Optional in models.py, Union/YoutubePreview in search.py
Adds 'raise ... from exc' in _utils._parse_object so HTML-parse errors
keep their causal chain. Promotes the package re-exports in
tutubo/__init__.py to an explicit __all__ tuple so ruff stops flagging
intentional surface.
* fix(examples): repair stale attribute and class references
Bugs caught while running every example end-to-end:
- ch_playlists, iptv: imported Channel from tutubo.models (which only
reaches it via a transitive star-import); switch to tutubo.channel.
ch_playlists also called the non-existent c.video_urls (Channel exposes
videos_url + per-Playlist video_urls).
- comedy_clips: hits.sort(reverse=True) crashed because tuples fell back
to comparing Video objects on view-count ties; sort by views explicitly.
- documentary_deep_dive, kids_safe_feed, live_news_dashboard, livestreams:
Video (channel-page) has no .length — only VideoPreview (search) does.
Print published_time / channel-card metadata instead.
- documentary_deep_dive: PodcastPreview has no content_type; show
episode_count.
- music_albums: iterate_albums lives on YoutubeMusicSearch, not on
YoutubeSearch. Use the music search facade.
- mus: MusicArtist isn't a MusicPlaylist subclass, so the playlist
branch never matched and the video branch crashed on v.length /
v.artist; widen the isinstance and use v.name.
- search: dropped 'from tutubo.models import *' (which silently relied
on transitive Channel/Playlist re-exports) and the reference to
RelatedVideo (no such class — only RelatedVideoPreview).
- signals_routing: drop unused PlaybackModality import.
* chore: gitignore example M3U8 playlist artifacts
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: pin mediavocab>=0.1.1 (first published PyPI release) and add LICENSE
The previous constraint mediavocab>=0.1.0 failed CI because 0.1.0 was
never published to PyPI; 0.1.1 is the first public release. Also adds the
missing Apache-2.0 LICENSE file flagged by repo-health check.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: drop legacy requirements.txt — pyproject.toml is authoritative
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: named regression guards for 2026 parser/API-shape bugs
Centralises tripwires for the historical bugs uncovered this year so a
future refactor (or upstream YT drift) re-trips them with explicit
names: Channel star-import, Channel.video_urls removal, channel-tab
Video.length omission, channel-page Video.published_time depopulation,
MusicArtist YTMusicResult subclass invariant, and the
YoutubeSearch / YoutubeMusicSearch surface separation
(iterate_albums / iterate_artists / iterate_tracks live only on the
latter).
* ci: nightly-live workflow re-records fixtures against live YouTube
Mirrors the audiobooker drift-detection pattern: every night the
workflow re-runs record_fixtures.py against live youtube.com /
ytmusic, then replays the offline parser suite against the freshly
captured JSON. A failure means upstream drifted and the committed
fixtures (and possibly the parser) need attention — the canonical YT
silent-failure mode where consumers see empty results in production.
Captured fixtures are NOT committed back; the job is detect-only.
Also adds vcrpy + pytest-vcr to the [test] extra so future
HTTP-cassette tests (e.g. for channel HTML scraping where vcrpy
intercepts requests cleanly) can be added incrementally without
reshuffling deps.
* feat: pluggable HTTP transport with optional curl_cffi stealth extra
Channel and Playlist now route all requests through an injectable
session. tutubo.transport.default_session() returns a curl_cffi session
when TUTUBO_TRANSPORT=curl_cffi (and curl_cffi is installed via the new
[stealth] extra), else a stdlib requests.Session. _innertube._post still
uses urllib.request directly; documented as out-of-scope.
* test: comprehensive coverage suite — 68% → 97%
Adds 235 targeted unit tests filling gaps in:
- mediavocab_bridge (49% → 100%): every Video/Music/Channel/Podcast
→ Work/Release/Entity converter, including upcoming/live/radio
branches, label/no-label, dedup, and override paths
- ytmus (45% → 98%): MusicTrack/Album/Playlist/Artist parsing edge
cases (empty fields, fallback chains), get_album dual-path,
search_yt_music type dispatch + error swallow, _get_ytmus retry
- download (16% → 100%): every subprocess wrapper branch with
mocked yt-dlp (audio-only, quality, filename, merging-format,
fallback-newest-file, error, playlist)
- _innertube (53% → 100%): HTTP/URL error paths and fixture
recording for both query and continuation
- _utils (73% → 94%): URL helpers, HTML extraction failures,
DeferredGeneratorList edge cases
- channel (67% → 94%): metadata properties, live-stream detection,
visitor_data extraction, lockup/videoRenderer item parsing,
Playlist HTML/data/title/video_urls + continuation_post
- search (82% → 98%): synthetic fixture covering every renderer
type (shelf/radio/playlist/channel/refinement/ad/messageRenderer),
YoutubeMusicSearch error swallow paths, all iterate_* and for_*
shortcuts
Total: 1727 stmts, 56 missed → 97% line coverage.
All tests offline (no network, no yt-dlp binary).
* fix: resolve all ruff lint errors (unused imports, E402)
Remove unused imports across test files and examples, fix E402
module-level import ordering in examples/search.py, examples/fanedits.py,
test/record_fixtures.py, and test/test_search.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: fix stale citations, wrong property names, add transport + mediavocab docs
AI-Generated Change:
- Model: claude-sonnet-4-6
- Intent: keep docs accurate after mediavocab migration and channel API changes
- Impact:
- docs/channel.md: fixed Channel.live vs Channel.streams property names;
removed references to non-existent channel.current_live; corrected
url shortcuts (live_url -> streams_url)
- docs/search.md: removed non-existent YoutubeSearch.iterate_youtube_music()
and related music methods; moved YoutubeMusicSearch to its own section
with correct class/method names; pruned stale SearchType values
- docs/content_type.md: replaced tutubo/content_type.py citations with
mediavocab.taxonomy / mediavocab.text (module no longer in tutubo)
- docs/models.md: same citation fix for ContentType/classify_video
- docs/index.md: corrected class table citations, added YoutubeMusicSearch row,
expanded Contents with mediavocab.md and transport.md links
- docs/mediavocab.md: new — Work/Release/Entity bridge, field mapping tables,
StreamMode overrides, ContentType->MediaType routing
- docs/transport.md: new — pluggable session, curl_cffi extra, env var,
scope (search path excluded)
- Verified via: manual review against source code (tutubo/channel.py,
tutubo/transport.py, tutubo/search.py, tutubo/mediavocab_bridge.py)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* examples: numbered zero-to-hero progression (01–11)
AI-Generated Change:
- Model: claude-sonnet-4-6
- Intent: clean numbered example progression replacing unstructured set
- Impact:
- 01_quickstart.py: search, result types, content_type, dict interface
- 02_search_factories.py: 24 factory methods + typed iterators
- 03_channel.py: metadata, videos, Channel.live vs Channel.streams (correct names)
- 04_playlist.py: channel playlists and direct Playlist construction
- 05_podcasts.py: podcast shows, episode listing, is_podcast=True classification
- 06_music_search.py: YoutubeMusicSearch tracks/artists/playlists
- 07_music_album.py: MusicAlbum with full track listing
- 08_fanedits.py: fan-edit detection via parse_title + VariantKind.FANEDIT
- 09_to_mediavocab.py: all mediavocab fields from to_work()/to_release()
- 10_custom_session.py: transport modes, curl_cffi injection, Accept-Language
- 11_pipeline.py: full parse_title → classify → to_routing() → Signals pipeline
- Verified via: ast.parse on all 11 files
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Human review requested!