Skip to content

sjbrouillard/MediaProcessing

Repository files navigation

Media Processing Scripts

Automated workflow for analyzing, looking up metadata, and renaming ripped DVD/Blu-ray content.

Quick Start

# Automated workflow (recommended)
.\Start-MediaProcessing.ps1 -SourcePath "C:\DVDFab\DVDFab13\Output\Video"

The orchestration script runs all stages automatically with interactive approval.

Workflow Overview

Stage 1: Analyze     → Stage 2: Lookup    → Stage 3: Review    → Stage 4: Rename
(Metadata-First)       (Conditional APIs)    (User Confirm)       (TV/Movie)

Execution Options:

  • Start-MediaProcessing.ps1 (recommended): Automated orchestration through all stages
  • Manual stages (Scripts/1-*.ps1, Scripts/2-*.ps1, etc.): Fine-grained control for troubleshooting

Metadata Extraction Strategy

The workflow uses a metadata-first approach to minimize external API calls and improve performance:

Stage 1: Embedded Metadata Priority

Process:

  1. Scan embedded metadata using ffprobe (FFmpeg tool)

    • Extracts TITLE tag from MKV/MP4 files
    • DVDFab and similar rippers often embed proper episode info during ripping
    • Example embedded TITLE: "Babylon 5 - s02e12 - Acts of Sacrifice"
  2. Parse episode information from TITLE tags

    • Matches patterns: "Show - S##E## - Title", "Show s##e## Title", "Show - ##x## - Title"
    • Extracts: Series name, season, episode number, episode title
    • Validates format using regex patterns
  3. Decision logic:

    • All files complete: Create mapping directly from embedded metadata, skip Claude AI entirely
    • Mixed metadata: Use hybrid approach (see below)
    • No usable metadata: Fall back to full Claude AI analysis

Hybrid Approach (Mixed Metadata)

When some files have embedded metadata and others don't:

  1. Extract metadata from files with valid TITLE tags
  2. Build context: series name, season, episode list from complete files
  3. Send only incomplete files to Claude with context (reduced prompt size by 50-90%)
  4. Merge results: embedded metadata + Claude inference
  5. Mark directories as metadata_source: "hybrid"

Example (Babylon 5 corpus):

  • 42 files: Complete embedded metadata → processed directly
  • 2 files: Malformed TITLE tags → sent to Claude with context from the 42 complete files
  • Result: 57% smaller Claude prompt, 95% fewer API calls

Stage 2: Conditional API Lookups

  • Skip API calls for files with complete embedded metadata
  • Only query TVMaze/TMDb for files marked as [Needs Lookup]
  • Typical savings: 80-100% reduction in API calls for DVDFab rips

Performance Benefits

For typical DVDFab rips with embedded metadata:

  • Claude API calls: 0-1 (vs. always 1 previously)
  • Claude prompt size: Reduced by 50-90% when using hybrid mode
  • TVMaze/TMDb calls: 0 (vs. 1 per episode previously)
  • Stage 1 execution: ~2-5 seconds (vs. 10-30 seconds)
  • Overall workflow: ~15-30 seconds (vs. 1-2 minutes)

Requirements

ffprobe must be in PATH for embedded metadata extraction:

  • Install FFmpeg: https://ffmpeg.org/download.html
    • Windows: Use winget, chocolatey, or manual download
    • winget install Gyan.FFmpeg (recommended)
  • Verify installation: ffprobe -version
  • If not available: Workflow automatically falls back to Claude AI analysis

Troubleshooting:

Issue Solution
ffprobe not found in PATH Add FFmpeg bin directory to system PATH, restart terminal
No embedded metadata found DVDFab didn't embed TITLE tags - workflow uses Claude AI fallback
Malformed TITLE tags Hybrid mode sends these to Claude with context from complete files

Directory Structure

MediaProcessing/
├── Start-MediaProcessing.ps1           # Main orchestration script (runs all stages)
├── Test-FullWorkflow.ps1               # Testing script (dry-run mode)
├── README.md                           # This file
│
├── Config/                             # Configuration files
│   ├── claude-pricing.json            # Claude API pricing configuration
│   └── config.xml                     # TMDb API key (gitignored, created by Setup-ApiKey.ps1)
│
├── Modules/                            # PowerShell modules
│   ├── MediaProcessing-Common.psm1    # Shared functions (API calls, file ops, etc.)
│   └── MediaProcessing-Logging.psm1   # Logging infrastructure
│
├── Scripts/                            # Workflow stage scripts
│   ├── 1-Analyze-RippedMedia.ps1      # Stage 1: AI-powered content analysis
│   ├── 2-Lookup-Metadata.ps1          # Stage 2: Online metadata lookup
│   ├── 3-Review-Metadata.ps1          # Stage 3: User review interface
│   ├── 4-Rename-TVSeries.ps1          # Stage 4a: TV series renaming
│   └── 4-Rename-Movies.ps1            # Stage 4b: Movie renaming
│
├── Tools/                              # Setup & maintenance utilities
│   ├── Setup-ApiKey.ps1               # Configure TMDb API key
│   ├── Setup-Pricing.ps1              # Update Claude pricing configuration
│   └── Cleanup-OldMappings.ps1        # Remove old mapping files
│
├── Docs/                               # Supporting documentation
│   ├── FUTURE-ENHANCEMENTS.md         # Enhancement tracking
│   └── SESSION-SUMMARY-*.md           # Session notes
│
├── Prompts/                            # Claude Code prompt templates
│   └── analyze-prompt.txt             # Stage 1 AI prompt
│
├── Output/                             # Timestamped JSON mappings (gitignored)
├── Cache/                              # API response cache (gitignored)
└── Logs/                               # Daily log files (gitignored)

Configuration

Claude API Pricing

The workflow uses claude-pricing.json to calculate accurate API costs. The pricing file is automatically checked for staleness (>30 days old).

Pricing File Location: MediaProcessing/Config/claude-pricing.json

Update Pricing:

.\Tools\Setup-Pricing.ps1

The script will:

  • Show current pricing and last update date
  • Prompt for new pricing values (or press Enter to keep defaults)
  • Update the last_updated date to today
  • Save the updated configuration

When to Update:

  • The pricing file is more than 30 days old (Stage 1 will display warning)
  • Claude announces pricing changes (check https://www.anthropic.com/pricing)
  • The pricing file is missing or corrupted

Current Pricing (as of Jan 2026):

  • Haiku: $0.80/M input, $4.00/M output (fast, simple tasks)
  • Sonnet: $3.00/M input, $15.00/M output (balanced, default)
  • Opus: $15.00/M input, $75.00/M output (complex reasoning)

Stage 1: Analyze Ripped Media

Purpose: Extract embedded metadata and analyze directory structure to classify content type (TV series, movies, etc.).

Process:

  1. Scan for embedded metadata using ffprobe (if available)
  2. Parse TITLE tags to extract episode information
  3. For files with complete metadata: Create mapping directly (skip Claude AI)
  4. For files needing inference: Call Claude AI with context from complete files (hybrid mode)
  5. Generate timestamped JSON mapping file

Input: Root directory containing ripped media files Output: JSON mapping file with content analysis and metadata sources Dependencies:

  • Optional: FFmpeg (ffprobe) for embedded metadata extraction (recommended)
  • Claude Code (only called when metadata is incomplete)
.\Scripts\1-Analyze-RippedMedia.ps1 -SourcePath "C:\DVDFab\DVDFab13\Output\Video"

What to expect:

  • Best case: "All files have complete embedded metadata - skipping Claude AI" (0 API calls)
  • Hybrid case: "Calling Claude AI for X directories with incomplete files" (reduced prompt)
  • Fallback: Full Claude AI analysis (when ffprobe unavailable or no embedded metadata)

Stage 2: Lookup Metadata

Purpose: Conditionally enrich analysis with online metadata for files that need it

Process:

  1. Skip files with metadata_source: "embedded" (already complete)
  2. Check local cache for previously fetched API responses
  3. Query TVMaze/TMDb APIs only for uncached files marked [Needs Lookup]
  4. Cache API responses for future use (30-day TTL)
  5. Display efficiency statistics (API calls saved + cache performance)

Input: JSON mapping from Stage 1 Output: Enhanced JSON mapping with metadata Dependencies: Internet connection, TMDb API key (free - only if TVMaze insufficient)

What to expect:

  • Files with embedded metadata are skipped (0 API calls for those files)
  • First run: Cache misses, fetches from APIs, caches responses
  • Subsequent runs: Cache hits, 90-100% faster (no external API calls)
  • Efficiency gains displayed: "API calls saved: X (50%+ reduction)"
  • Cache statistics: "Cache hits: X, Hit rate: Y%"

API Response Caching

Stage 2 automatically caches API responses to dramatically speed up repeat workflow runs:

How It Works:

  • TVMaze and TMDb API responses are cached locally in Cache/ directory
  • Each show/movie has a unique cache key (e.g., tvmaze-Babylon-5, tmdb-tv-Babylon-5-s1)
  • Cache entries expire after 30 days (configurable TTL)
  • Cache is checked before making any external API calls

Performance Benefits:

  • First run: Normal API calls, responses cached for future use
  • Repeat runs: 90-100% faster, zero external API calls for cached shows
  • Hit rate tracking: See cache effectiveness in real-time

Cache Management:

# View cache directory
Get-ChildItem .\Cache

# Clear all cache (force fresh lookups)
Remove-Item .\Cache\*.json

# Clear specific show cache
Remove-Item .\Cache\tvmaze-Babylon-5.json

Cache Statistics Example:

Cache Performance:
  Cache hits: 12
  Cache misses: 0
  Hit rate: 100%
  External API calls saved by cache: 12

Setup (One-time)

  1. Get free TMDb API key: https://www.themoviedb.org/settings/api
  2. Configure it securely (never committed to git):
.\Tools\Setup-ApiKey.ps1 -TMDbApiKey "your-api-key-here"

Usage

.\Scripts\2-Lookup-Metadata.ps1 -MappingFile ".\Output\mapping-20260110-153045.json"

How it works:

  • Tries TVMaze API first (no key needed, but limited coverage)
  • Falls back to TMDb API for comprehensive TV and movie metadata
  • API key loaded automatically from config.xml (gitignored)

API Key Priority:

  1. -TMDbApiKey parameter (if provided)
  2. config.xml file (recommended, auto-loaded)
  3. $env:TMDB_API_KEY environment variable

Stage 3: Review Metadata

Purpose: Review proposed changes and confirm before renaming

Input: Enhanced JSON mapping from Stage 2 Output: User-approved JSON mapping

.\Scripts\3-Review-Metadata.ps1 -MappingFile ".\Output\mapping-20260110-153045.json"

Stage 4: Rename Files

Purpose: Execute file renaming based on approved mapping

Input: Approved JSON mapping from Stage 3 Output: Renamed files

# TV Series
.\Scripts\4-Rename-TVSeries.ps1 -MappingFile ".\Output\mapping-20260110-153045.json"

# Movies
.\Scripts\4-Rename-Movies.ps1 -MappingFile ".\Output\mapping-20260110-153045.json"

Cleanup Utility

Remove mapping files older than specified days (default: 30 days)

.\Tools\Cleanup-OldMappings.ps1 -DaysToKeep 30

JSON Mapping Format

See Prompts/analyze-prompt.txt for the expected JSON schema.

Common Workflows

Automated Orchestration (Recommended)

Use the orchestration script to run all stages automatically:

# Complete workflow with interactive approval
.\Start-MediaProcessing.ps1 -SourcePath "C:\DVDFab\DVDFab13\Output\Video"

# Preview mode (see what would be renamed without making changes)
.\Start-MediaProcessing.ps1 -SourcePath "C:\DVDFab\DVDFab13\Output\Video" -PreviewOnly

# Skip metadata lookup if files have complete embedded metadata
.\Start-MediaProcessing.ps1 -SourcePath "C:\DVDFab\DVDFab13\Output\Video" -SkipLookup

# Pause between stages for review
.\Start-MediaProcessing.ps1 -SourcePath "C:\DVDFab\DVDFab13\Output\Video" -PauseBetweenStages

# Provide TMDb API key directly
.\Start-MediaProcessing.ps1 -SourcePath "C:\DVDFab\DVDFab13\Output\Video" -TMDbApiKey "your-key"

Features:

  • Runs all 4 stages automatically
  • Interactive review and approval in Stage 3
  • Auto-detects TV series vs. movies and calls appropriate rename scripts
  • Comprehensive workflow summary with performance metrics
  • Preserves mapping files for reference
  • Supports -PreviewOnly for safe preview runs (dry run mode)
  • Error handling with graceful workflow abort

When to Use:

  • Orchestration script: When you want a streamlined, automated workflow
  • Manual stages: When you need fine-grained control or want to troubleshoot individual stages

Manual Stage-by-Stage Workflow

For fine-grained control, run each stage individually:

# Step 1: Analyze
$mapping = .\Scripts\1-Analyze-RippedMedia.ps1 -SourcePath "C:\Rips\Babylon5"

# Step 2: Lookup metadata
.\Scripts\2-Lookup-Metadata.ps1 -MappingFile $mapping

# Step 3: Review and confirm
.\Scripts\3-Review-Metadata.ps1 -MappingFile $mapping

# Step 4: Rename
.\Scripts\4-Rename-TVSeries.ps1 -MappingFile $mapping  # For TV series
.\Scripts\4-Rename-Movies.ps1 -MappingFile $mapping    # For movies

Notes

  • Mapping files are automatically timestamped and stored in Output/
  • Use -WhatIf on Stage 4 scripts to preview changes without renaming
  • Mapping files older than 30 days can be cleaned up with Tools\Cleanup-OldMappings.ps1

About

Media ripping and processing scripts for Jellyfin

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors