Minh/speech transcription tutorial by yishangupenn · Pull Request #1193 · perf-ai-example-org/openai-cookbook

yishangupenn · 2026-03-10T07:41:35Z

Copied from upstream: openai/openai-cookbook#1807
Original author: @minh-hoque
Originally opened: 2025-05-02

Summary

This pull-request adds a new end-to-end tutorial, examples/Speech_transcription_methods.ipynb, that compares four different ways to convert speech to text with OpenAI tools:

Audio /transcriptions endpoint (single request)
Audio /transcriptions with streaming
Speech Realtime API (web-socket)
Agents SDK with the new speech_to_text tool

The notebook walks through the trade-offs, provides helper functions, and benchmarks each approach on several sample files.
To support the tutorial we also add:

Sample audio clips (examples/data/sample_audio_files/…)
Explanatory diagrams (Mermaid source in examples/mermaid/… and rendered PNGs in examples/imgs/…)
Updates to .gitignore (ignore large/temporary audio)

Together these assets give cookbook readers a practical, runnable reference for choosing the right transcription workflow.

Motivation

High-quality speech transcription is a common requirement for chatbots, call-analysis, meeting notes, and real-time assistants. OpenAI now offers multiple APIs and SDK features for this, but the differences (latency, code patterns, streaming vs. batch, etc.) are not obvious to newcomers.

This tutorial:

Shows concrete, runnable examples for every current method
Highlights pros / cons and performance considerations
Provides reusable helper code and diagrams to speed up adoption

Adding this content will make the Cookbook a good guide for developers integrating speech capabilities.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines:
- Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct and all of my code executes successfully.
- Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

…ech_transcription_tutorial

minh-hoque added 19 commits April 29, 2025 09:28

Added new cookbook on speech transcription methods

3320b42

added audio files

aaa9d29

updated gitignore

14844d4

Fixed resampling to 24 kHz

bdeff55

Fixed markdown

32918fa

Fixed mermaid rendering in notebook

32179a3

updated markdown

f2ff734

Changed audio file

90b3873

Simplified REALTIME API code

883550c

Improved markdown table

cc9feee

Merge commit '22a8c6cf484edc78ec8db5cabd3da9c8b59e1d20' into minh/spe…

6211668

…ech_transcription_tutorial

Improve markdown and helper functions

02be231

cleaned imports

a2d9500

replaced model names with CONSTANT

acf649d

updated registry

8963925

Merge commit '66a31c140ee480c80749046e817e8c85231a3bb5' into minh/spe…

3e6d268

…ech_transcription_tutorial

Fixed PR comments

732cf24

Merge commit 'd40a72141ba0e240e4dcbdeb7eeb5ea70da7a384' into minh/spe…

78c005e

…ech_transcription_tutorial

Updated cell outputs

4d69a7c

yishangupenn closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minh/speech transcription tutorial#1193

Minh/speech transcription tutorial#1193
yishangupenn wants to merge 19 commits into
mainfrom
upstream-pr-1807

yishangupenn commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yishangupenn commented Mar 10, 2026

Summary

Motivation

Adding this content will make the Cookbook a good guide for developers integrating speech capabilities.

For new content

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants