Skip to content

Support multi-language YouTube transcript fallback (en, es, pt) #469

@23r2efewvcs

Description

@23r2efewvcs

Problem

youtube_yt.py:_fetch_transcript_ytdlp() (line 508) hardcodes --sub-lang en — only English auto-captions are attempted. For non-English content, this returns None and the video appears in results without a transcript.

The direct HTTP fallback (_fetch_transcript_direct(), line 470) already supports multi-language: it falls back to the first available caption track when no English track is found. But the primary yt-dlp path does not.

Current behavior:

  • yt-dlp path: --sub-lang en → English only → returns None for non-English videos
  • Direct HTTP fallback: tries enen-* → first available track → works for non-English

Proposed Solution

Change --sub-lang en to --sub-lang en,es,pt in _fetch_transcript_ytdlp(). yt-dlp already supports comma-separated language lists — it tries each in order and downloads the first available.

# youtube_yt.py line 508, before:
"--sub-lang", "en",

# after:
"--sub-lang", "en,es,pt",

Optionally make this configurable via env var LAST30DAYS_YT_SUB_LANGS with default en,es,pt, read in env.py.

LLMs understand all three languages natively. A transcript in any of these is better than no transcript.

Estimated impact: +30-50% more transcripts captured, especially for non-English content.

Alternatives Considered

  • Use only en,en-orig — catches original language when auto-caption exists, but misses non-English creators entirely
  • Download ALL available languages — wastes bandwidth and storage for marginal gain
  • Make it configurable without default — users won't configure it, same problem persists
  • Rely on direct HTTP fallback only — fallback is less reliable (no yt-dlp retry logic, timeout handling)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions