Skip to content

new-contrib: Audio Whisper API with Local Device Microphones #49

Open
yishangupenn wants to merge 17 commits into
mainfrom
upstream-pr-1271
Open

new-contrib: Audio Whisper API with Local Device Microphones #49
yishangupenn wants to merge 17 commits into
mainfrom
upstream-pr-1271

Conversation

@yishangupenn

@yishangupenn yishangupenn commented Mar 10, 2026

Copy link
Copy Markdown

Copied from upstream: openai/openai-cookbook#1271
Original author: @CarlKho-Minerva
Originally opened: 2024-07-06


Summary

This PR adds a new notebook that demonstrates how to use the Whisper API to transcribe text from your device's microphone. The notebook includes steps to record audio, transcribe it using the Whisper API, and copy the transcription to the clipboard. It aims to provide a practical guide for users who want to integrate speech-to-text functionality into their applications.

*This pull request was written by Chat GPT and reviewed by a human. The article, however, is made by a human.

Motivation

This tutorial was created because the functionality to transcribe speech to text from a microphone is not well-documented. I found the mic speech-to-text option in the ChatGPT apps (not websites) extremely helpful for day-to-day operations and wanted to save others from having to learn about different audio processing modules.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

  • I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
  • I have conducted a self-review of my content based on the contribution guidelines (my previous PR message detailed on every one of these 😅):
    • Relevance: This content is related to building with OpenAI technologies and is useful to others.
    • Uniqueness: I have searched for related examples in the OpenAI Cookbook and verified that my content offers new insights or unique information compared to existing documentation.
    • Spelling and Grammar: I have checked for spelling or grammatical mistakes.
    • Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
    • Correctness: The information I include is correct, and all of my code executes successfully.
    • Completeness: I have explained everything fully, including all necessary references and citations.

- Refactor: Separate transcribe and translate functions
- Refactor: Clarify prompt usage in demos (example-based)
- Refactor: Add 5-second limit to Spanish translation demo
- Docs: Improve formatting and clarity of audio recording details
- Docs: Add note about prompt usage with links to API docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants