This repository contains vCon (Virtual Conversation Container) files for IETF working group sessions from meetings 110-125 (March 2021 - March 2026).
vCon is an IETF standard format for capturing conversation data. Each vCon file contains:
- Meeting metadata - Date, location, working group information
- Video recording - YouTube URL for the session recording
- Transcript - Full transcript in WTF (World Transcription Format) with word-level timestamps
- Materials - Links to slides, agenda, minutes, and other session documents
- Participants - Working group chairs and attendee information
- Lawful basis - IETF Note Well documentation per draft-howe-vcon-lawful-basis
ietf-meeting-vcons/
├── ietf110/ # IETF 110 (March 2021, Online)
│ ├── ietf110_6man_28833.vcon.json
│ ├── ietf110_httpbis_28597.vcon.json
│ └── ...
├── ietf111/ # IETF 111 (July 2021, Online)
├── ietf112/ # IETF 112 (November 2021, Online)
├── ietf113/ # IETF 113 (March 2022, Vienna)
├── ietf114/ # IETF 114 (July 2022, Philadelphia)
├── ietf115/ # IETF 115 (November 2022, London)
├── ietf116/ # IETF 116 (March 2023, Yokohama)
├── ietf117/ # IETF 117 (July 2023, San Francisco)
├── ietf118/ # IETF 118 (November 2023, Prague)
├── ietf119/ # IETF 119 (March 2024, Brisbane)
├── ietf120/ # IETF 120 (July 2024, Vancouver)
├── ietf121/ # IETF 121 (November 2024, Dublin)
├── ietf122/ # IETF 122 (March 2025, Bangkok)
├── ietf123/ # IETF 123 (July 2025, Madrid)
├── ietf124/ # IETF 124 (November 2025, Yokohama)
└── ietf125/ # IETF 125 (March 2026, Shenzhen)
Files follow the pattern: ietf{meeting}_{group}_{session_id}.vcon.json
meeting- IETF meeting number (110-124)group- Working group acronym (e.g.,httpbis,quic,tls)session_id- Unique session identifier from the IETF Datatracker
Each vCon file follows the draft-ietf-vcon-vcon-container specification:
{
"vcon": "0.0.1",
"uuid": "unique-identifier",
"created_at": "2024-11-07T15:30:00Z",
"subject": "IETF 121 - QUIC Working Group Session",
"parties": [
{"name": "Chair Name", "mailto": "chair@example.com", "role": "chair"}
],
"dialog": [
{"type": "video", "url": "https://www.youtube.com/watch?v=..."}
],
"attachments": [
{"type": "agenda", "url": "https://datatracker.ietf.org/..."},
{"type": "slides", "url": "https://datatracker.ietf.org/..."},
{"type": "lawful_basis", "body": {"lawful_basis": "legitimate_interests", ...}}
],
"analysis": [
{"type": "wtf_transcription", "spec": "draft-howe-wtf-transcription-00", "body": {...}}
]
}All data is sourced from public IETF resources:
- Session metadata: IETF Datatracker API
- Video recordings: IETF YouTube Channel
- Transcripts: YouTube auto-generated captions
- Materials: IETF Meeting Materials Archive
All IETF meeting sessions are conducted under the IETF Note Well, which permits recording, transcription, and publication. This is documented in each vCon's lawful_basis attachment.
| Metric | Value |
|---|---|
| Meetings | 16 (IETF 110-125) |
| Total vCons | 2,408 |
| Date Range | March 2021 - March 2026 |
| Working Groups | ~50 per meeting |
import json
# Load a vCon
with open("ietf121/ietf121_quic_33502.vcon.json") as f:
vcon = json.load(f)
# Get session info
print(f"Subject: {vcon['subject']}")
print(f"Video: {vcon['dialog'][0]['url']}")
# Access transcript
for analysis in vcon.get("analysis", []):
if analysis["type"] == "wtf_transcription":
transcript = analysis["body"]
for segment in transcript["segments"][:5]:
print(f"[{segment['start']:.1f}s] {segment['text']}")# Get all video URLs from a meeting
jq -r '.dialog[0].url' ietf121/*.vcon.json
# Extract transcript text
jq -r '.analysis[] | select(.type=="wtf_transcription") | .body.segments[].text' file.vcon.json
# List all working groups in a meeting
ls ietf121/*.vcon.json | sed 's/.*ietf121_\(.*\)_.*/\1/' | sort -uThese vCons were generated using ietf2vcon, an open-source tool for converting IETF meeting sessions to vCon format.
To generate additional vCons:
pip install ietf2vcon
ietf2vcon convert --meeting 125 --group quicThis repository includes tools to re-transcribe IETF meeting audio using Speechmatics for higher-quality transcriptions with speaker diarization.
-
Speechmatics API Key: Sign up at speechmatics.com and obtain an API key.
-
FFmpeg: Required for audio processing.
- macOS:
brew install ffmpeg - Ubuntu/Debian:
apt install ffmpeg - Windows: Download from ffmpeg.org
- macOS:
-
Python dependencies:
pip install -r scripts/requirements.txt
Set your API key as an environment variable:
export SPEECHMATICS_API_KEY="your-api-key-here"Transcribe a single vCon file:
python scripts/transcribe.py ietf121/ietf121_quic_33502.vcon.jsonTranscribe all sessions from a specific meeting:
python scripts/transcribe.py --meeting 121Transcribe a specific working group:
python scripts/transcribe.py --meeting 121 --group quicTranscribe all vCons missing Speechmatics transcription:
python scripts/transcribe.py --all-pendingPreview which files would be transcribed:
python scripts/transcribe.py --all-pending --dry-runThe script:
- Downloads audio from the YouTube recording linked in each vCon
- Submits the audio to Speechmatics for transcription with speaker diarization
- Converts the result to WTF (World Transcription Format)
- Updates the vCon file with the new transcription in the
analysisarray
The Speechmatics transcription is stored alongside any existing YouTube transcription, with "vendor": "speechmatics" to distinguish it.
The Speechmatics transcription includes:
- Word-level timestamps: Precise timing for each word
- Speaker diarization: Identification of different speakers
- Confidence scores: Per-word and per-segment confidence metrics
- Segments: Logical groupings of speech (sentences/phrases)
- Quality metrics: Overall transcription quality assessment
This repository also includes a tool for transcribing IETF meetings using OpenAI Whisper locally via faster-whisper. No API key or cloud service is required.
-
FFmpeg: Required for audio processing.
- macOS:
brew install ffmpeg - Ubuntu/Debian:
apt install ffmpeg
- macOS:
-
Python dependencies:
pip install -r scripts/requirements.txt
Transcribe a single vCon file:
python scripts/whisper_transcribe.py ietf125/ietf125_quic_XXXXX.vcon.jsonTranscribe all sessions from a specific meeting:
python scripts/whisper_transcribe.py --meeting 125Transcribe a specific working group:
python scripts/whisper_transcribe.py --meeting 125 --group quicUse a faster (smaller) model:
python scripts/whisper_transcribe.py --meeting 125 --model mediumTranscribe all vCons missing Whisper transcription:
python scripts/whisper_transcribe.py --all-pendingPreview which files would be transcribed:
python scripts/whisper_transcribe.py --meeting 125 --dry-run| Model | Size | Speed | Quality |
|---|---|---|---|
tiny |
39M | Fastest | Low |
base |
74M | Fast | Fair |
small |
244M | Moderate | Good |
medium |
769M | Moderate | Better |
large-v3 |
1.5G | Slow | Best (default) |
The script:
- Downloads audio from the YouTube recording linked in each vCon
- Transcribes locally using faster-whisper with word-level timestamps
- Converts the result to WTF (World Transcription Format)
- Updates the vCon file with the new transcription in the
analysisarray
The Whisper transcription is stored with "vendor": "whisper" to distinguish it from YouTube auto-captions and Speechmatics transcriptions. It includes real word-level timestamps and per-segment confidence scores.
- draft-ietf-vcon-vcon-container - vCon container format
- draft-howe-wtf-transcription - World Transcription Format
- draft-howe-vcon-lawful-basis - Lawful basis extension
Contributions are welcome! Please open an issue or pull request if you:
- Find errors in the vCon data
- Want to add vCons for additional meetings
- Have suggestions for improvements
This data is made available under the BSD-3-Clause License. See LICENSE for details.
The underlying IETF meeting content is subject to the IETF Trust Legal Provisions.
- IETF for making meeting recordings and materials publicly available
- The vCon working group for developing the conversation container standard
- YouTube for hosting IETF meeting recordings with auto-generated captions