PyBay Video Publishing Helpers

Utilities to automate processing of PyBay conference videos for publication on YouTube and PyVideo.

Overview

This toolkit helps volunteers prepare PyBay conference videos for publication by:

Downloading videos from Google Drive
Fetching talk metadata from the PyBay website
Renaming videos to a consistent, publication-ready formatg

Key Design Principles:

Use public information - Relies on publicly accessible pybay.org pages to avoid requiring volunteers to access complex/paid systems (Sessionize, paid Google Drive accounts, etc.)
Handle variability - Works with inconsistent input from multiple sources that change year-to-year
Minimize friction - Designed for volunteers who perform this task once per year

The Challenge

Publishing PyBay videos involves reconciling data from multiple sources with varying quality:

Speaker-provided data (via Sessionize):
- Talk titles, descriptions, speaker names
- We don't control this input - speakers can format names inconsistently
- Changes format/structure year-to-year
AV team video filenames:
- VERY LOOSE file naming standards that changes slightly every year
- Examples from 2025: Robertson - 1000 - Brousseau - Welcome Remarks.mp4
- May use different time formats (12hr vs 24hr), varying separators, etc.
- Different person may handle this each year → different conventions
Google Drive organization:
- Videos uploaded by AV team
- Requires authentication to access
- Original filenames preserved in metadata

Our solution: Use the official schedule published on the public PyBay website as the authoritative source of truth, then match videos using intelligent token-based matching (room + time + speaker name).

Installation

# Clone the repo
git clone https://github.com/pybay/pybay-video-publishing-helpers.git
cd pybay-video-publishing-helpers

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Quick Start

Simple One-Command Workflow (Recommended for Volunteers)

Download and rename all videos in one command:

python src/google_drive_video_downloader.py \
  --gdrive-url "https://drive.google.com/drive/folders/YOUR_FOLDER_ID" \
  --output-path "pybay_videos_destination" \
  --year 2025

This single command automatically:

✅ Downloads all videos in parallel (4-8x faster)
✅ Saves metadata → _pybay_2025_gdrive_metadata.json
✅ Fetches talk data → _pybay_2025_talk_data.json
✅ Renames to publication format → Title — Speaker (PyBay 2025).mp4
✅ Flags unmatched files for review → ![REVIEW_NEEDED]_filename.mp4
✅ Verifies downloads with MD5 checksums
✅ Skips already-downloaded files (resumable)

Using service account authentication:

export GOOGLE_DRIVE_API_KEY_PYBAY='{"type":"service_account",...}'
python src/google_drive_video_downloader.py \
  --gdrive-url "YOUR_FOLDER_ID" \
  --output-path "pybay_videos_destination" \
  --year 2025 \
  --service-account

Advanced: Two-Step Workflow

For volunteers who want more control, or want to rename videos with a different pattern after downloading:

# Step 1: Download only (skip renaming)
python src/google_drive_video_downloader.py \
  --gdrive-url "YOUR_FOLDER_ID" \
  --output-path "pybay_videos_destination" \
  --year 2025 \
  --download-only

# Step 2: Rename separately (with dry-run preview first)
python src/file_renamer.py \
  --video-dir "pybay_videos_destination" \
  --year 2025 \
  --dry-run

# Then actually rename
python src/file_renamer.py \
  --video-dir "pybay_videos_destination" \
  --year 2025

File naming:

Downloaded from GDrive: Robertson - 1000 - Brousseau - Welcome Remarks.mp4
Renamed to:            Welcome & Opening Remarks — Chris Brousseau (PyBay 2025).mp4

Downloaded from GDrive: Robertson - 1000 - Pliger - PyScript Talk.mp4
Renamed to:            Next Level Python Applications with PyScript — Fabio Pliger & Chris Laffra (PyBay 2024).mp4

Features

Multi-Speaker Support ✨

Handles talks with multiple speakers (panels, co-presentations):

JSON Format:

{
  "talk_title": "Next Level Python Applications with PyScript",
  "speakers": [
    {"firstname": "Fabio", "lastname": "Pliger"},
    {"firstname": "Chris", "lastname": "Laffra"}
  ]
}

Filename Output:

Next Level Python Applications with PyScript — Fabio Pliger & Chris Laffra (PyBay 2024).mp4

Intelligent Matching

Matches videos to talk metadata with three data elements:

Room - Case-insensitive (e.g., Robertson, Fisher.)
Time - Normalized to 24-hour format (handles "10:00 am", "1000", "2:30 pm", "1430")
Name - Partial matching (handles "van Rossum", "Hatfield-Dodds", single names)

For multi-speaker talks, matches if ANY speaker name appears in the filename.

Special Cases Handled

✅ Multiple speakers joined with " & " (we often have 1-2 every year, last one in 2024)
✅ Hyphenated last names (e.g., Hatfield-Dodds)
✅ Single names (e.g., no last name, which comes from incomplete Sessionize profiels)
✅ Multi-part surnames (e.g., van Rossum)
✅ Missing name data (uses whatever is available)
✅ Files without metadata flagged for manual review by adding prefix to final filename

Parallel Downloads w/auto retry

Related Docs

README_VIDEO_PUBLISHING_WORKFLOW.md - Complete workflows with diagrams
README_GOOGLE_DRIVE_SETUP.md - Google Drive auth setup

Testing

Some tests written - could use more for sure

Test Coverage:

Multi-speaker handling (22 tests)
Web scraping and parsing (13 tests)
Time normalization (15 tests)

Project Structure

pybay-video-publishing-helpers/
├── src/
│   ├── google_drive_video_downloader.py  # Main download script (parallel)
│   ├── file_renamer.py                   # Token-based renamer
│   ├── scraper_pybayorg_talk_metadata.py # Scrapes pybay.org for talk data
│   ├── google_drive_fetch_metadata.py    # Standalone metadata fetcher
│   ├── google_drive_ops.py               # Google Drive API operations
│   ├── file_ops.py                       # File verification utilities
│   └── file_ops_parallel.py              # Fast parallel download functions
├── tests/
│   ├── test_multi_speaker.py             # Multi-speaker functionality tests
│   ├── test_scraper.py                   # Scraper function tests
│   └── test_time_normalization.py        # Time parsing tests
├── README_VIDEO_PUBLISHING_WORKFLOW.md   # Complete workflow documentation
├── README_GOOGLE_DRIVE_SETUP.md          # Authentication setup guide
└── requirements.txt                      # Python dependencies

Data Sources

1. PyBay Website (pybay.org)

Source: https://pybay.org/speaking/talk-list-YYYY/
Format: Sessionize API HTML
Contains: Talk titles, speaker names, rooms, times, descriptions
Saved to: _pybay_YYYY_talk_data.json
Why: Publicly accessible, authoritative source of truth

2. Google Drive Metadata

Source: Google Drive API
Contains: Original filenames from AV provider, file sizes, MD5 checksums
Saved to: _pybay_YYYY_gdrive_metadata.json
Why: Preserves audit trail of original AV team filenames

3. Downloaded Video Files

Current format: {Room} - {Time} - {LastName} - {Title}.mp4
Final format: {Title} — {FirstName} {LastName} ({Year}).mp4
Note: AV Team's Naming conventions vary year-to-year

Common Issues

Videos don't match metadata

Cause: Last-minute speaker changes, Alternate Speakers not added to official schedule in Sessionize, uAV team filename variations

Solution:

Renamer flags unmatched files for manual review
Manually rename these files, or
Add missing entries to _pybay_YYYY_talk_data.json

Time formats don't match

Cause: Inconsistent time formats between AV team and website

Solution:

Renamer normalizes all times to 24-hour format automatically
Handles: 10am, 10:00 am, 1000, 1430, 2:30 pm, etc.

Missing metadata files

Cause: Fresh download didn't create metadata, or files were deleted

Solution:

# Re-fetch Google Drive metadata (doesn't re-download videos)
python src/google_drive_fetch_metadata.py \
  --folder "YOUR_DRIVE_URL" \
  --year 2025

# Fetch PyBay website metadata
python src/scraper_pybayorg_talk_metadata.py \
  --url "https://pybay.org/speaking/talk-list-2025/" \
  --output "pybay_videos_destination/_pybay_2025_talk_data.json"

Contributing

This is a volunteer-driven project. Contributions welcome!

Good Future Improvements

New Features:

Upload to SF Python YouTube channel and playlist (needed!)
Automate creation of metadata for PyVideo
Improve fuzzy matching for edge cases
Integrate tqdm progress tracker for better download visibility

Test Coverage Gaps:

Areas without tests:

Google Drive operations (google_drive_ops.py, google_drive_video_downloader.py)
File operations (file_ops.py, file_ops_parallel.py)
Credential checking (google_drive_check_credentials.py)
Metadata fetching (google_drive_fetch_metadata.py)

Note for Future Volunteers: This repo was designed to be a little resilient to changes we have seen in past few years, but if something breaks, check:

Has the AV team changed their filename format?
Has pybay.org changed its URL structure?
Has Sessionize changed its HTML structure?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyBay Video Publishing Helpers

Overview

The Challenge

Installation

Quick Start

Simple One-Command Workflow (Recommended for Volunteers)

Advanced: Two-Step Workflow

Features

Multi-Speaker Support ✨

Intelligent Matching

Special Cases Handled

Parallel Downloads w/auto retry

Related Docs

Testing

Project Structure

Data Sources

1. PyBay Website (pybay.org)

2. Google Drive Metadata

3. Downloaded Video Files

Common Issues

Videos don't match metadata

Time formats don't match

Missing metadata files

Contributing

Good Future Improvements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pybay_videos_destination		pybay_videos_destination
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_GOOGLE_DRIVE_SETUP.md		README_GOOGLE_DRIVE_SETUP.md
README_VIDEO_PUBLISHING_WORKFLOW.md		README_VIDEO_PUBLISHING_WORKFLOW.md
requirements.txt		requirements.txt

License

pybay/pybay-video-publishing-helpers

Folders and files

Latest commit

History

Repository files navigation

PyBay Video Publishing Helpers

Overview

The Challenge

Installation

Quick Start

Simple One-Command Workflow (Recommended for Volunteers)

Advanced: Two-Step Workflow

Features

Multi-Speaker Support ✨

Intelligent Matching

Special Cases Handled

Parallel Downloads w/auto retry

Related Docs

Testing

Project Structure

Data Sources

1. PyBay Website (pybay.org)

2. Google Drive Metadata

3. Downloaded Video Files

Common Issues

Videos don't match metadata

Time formats don't match

Missing metadata files

Contributing

Good Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages