-
Notifications
You must be signed in to change notification settings - Fork 216
Parakeet text cleanups #1193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parakeet text cleanups #1193
Changes from 2 commits
f82d881
eabfaf1
7b4724f
cce3bed
8f13d09
e361fe3
484bd0c
882a0b7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,30 +7,33 @@ | |
# This example uses the `nvidia/parakeet-tdt-0.6b-v2` model, which, as of May 13, 2025, sits at the | ||
# top of Hugging Face's [ASR leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard). | ||
|
||
# To run this example either: | ||
# To run this example, either: | ||
|
||
# - run the browser/microphone frontend, or | ||
# - Run the browser/microphone frontend. Modal handles the deployment of both the frontend and backend in a single app! You should see a browser window pop up - make sure you allow access to your microphone. The full frontend code can be found [here](https://github.com/modal-labs/modal-examples/tree/main/06_gpu_and_ml/audio-to-text/frontend). | ||
# ```bash | ||
# modal serve 06_gpu_and_ml/audio-to-text/parakeet.py | ||
# ``` | ||
# - stream a .wav file from a URL (optional, default is "Dream Within a Dream" by Edgar Allan Poe). | ||
# - Or, stream a `.wav` file directly from a URL to simulate real-time transcription in your terminal: | ||
charlesfrye marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# ```bash | ||
# modal run 06_gpu_and_ml/audio-to-text/parakeet.py --audio-url="https://github.com/voxserv/audio_quality_testing_samples/raw/refs/heads/master/mono_44100/156550__acclivity__a-dream-within-a-dream.wav" | ||
# ``` | ||
|
||
# See [Troubleshooting](https://modal.com/docs/examples/parakeet#client) at the bottom if you run into issues. | ||
|
||
# Here's what your final output might look like: | ||
# You should see output like the following in your terminal: | ||
|
||
# ```bash | ||
# 🌐 Downloading audio file... | ||
# 🎧 Downloaded 6331478 bytes | ||
# ☀️ Waking up model, this may take a few seconds on cold start... | ||
# 📝 Transcription: A Dream Within A Dream Edgar Allan Poe | ||
# 📝 Transcription: | ||
# 📝 Transcription: take this kiss upon the brow, And in parting from you now, Thus much let me avow You are not wrong who deem That my days have been a dream. | ||
# 📝 Transcription: Take this kiss upon the brow, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't actually break up the lines like this in the output... do we want it to be the actual output or this better looking one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I kinda like the better looking one but defer to you guys if that's dishonest 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Movie magic yk There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @charlesfrye what's the Modal-Frye Style Guide say here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think splitting on punctuation with newlines in the code is a good idea! I'd like for the output to be real. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @charlesfrye i feel like coding that is a can of worms. like, just breaking on |
||
# 📝 Transcription: And in parting from you now, | ||
# 📝 Transcription: Thus much let me avow, | ||
# 📝 Transcription: You are not wrong who deem | ||
# 📝 Transcription: That my days have been a dream. | ||
# ... | ||
# ``` | ||
# See [Troubleshooting](https://modal.com/docs/examples/parakeet#client) at the bottom if you run into issues. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need the Troubleshooting section anymore. |
||
|
||
|
||
# ## Setup | ||
import asyncio | ||
|
@@ -40,9 +43,8 @@ | |
import modal | ||
|
||
os.environ["MODAL_LOGLEVEL"] = "INFO" | ||
app_name = "parakeet-websocket" | ||
|
||
app = modal.App(app_name) | ||
app = modal.App("parakeet-websocket") | ||
charlesfrye marked this conversation as resolved.
Show resolved
Hide resolved
|
||
SILENCE_THRESHOLD = -45 | ||
SILENCE_MIN_LENGTH_MSEC = 1000 | ||
END_OF_STREAM = b"END_OF_STREAM" | ||
|
@@ -101,6 +103,7 @@ | |
# - The `transcribe` method takes bytes of audio data, and returns the transcribed text. | ||
# - The `web` method creates a FastAPI app using [`modal.asgi_app`](https://modal.com/docs/reference/modal.asgi_app#modalasgi_app) that serves a | ||
# [WebSocket](https://modal.com/docs/guide/webhooks#websockets) endpoint for real-time audio transcription and a browser frontend for transcribing audio from your microphone. | ||
# - The `run_with_queue` method takes a [`modal.Queue`](https://modal.com/docs/reference/modal.Queue) and passes audio data and transcriptions between our local machine and the GPU container. | ||
|
||
# Parakeet tries really hard to transcribe everything to English! | ||
# Hence it tends to output utterances like "Yeah" or "Mm-hmm" when it runs on silent audio. | ||
|
@@ -275,9 +278,7 @@ def main(audio_url: str = AUDIO_URL): | |
# Below are the three main functions that coordinate streaming audio and receiving transcriptions. | ||
# | ||
# `send_audio` transmits chunks of audio data and then pauses to approximate streaming | ||
# speech at a natural rate. That said, we set it to faster | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thought this was a bit too honest 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say the prose is too casual. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll resist defending the tone of this sentence. But it might be worth mentioning that we set it to faster than realtime just so people understand why we divide the wait time by 8 (i.e. wait for 1/8th the duration of the chunk we just sent). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably okay to not mention it. it's not crucial or the focus here. |
||
# than real-time to compensate for network latency. Plus, we're not | ||
# trying to wait forever for this to finish. | ||
# speech at a natural rate. | ||
|
||
|
||
async def send_audio(q, audio_bytes): | ||
|
@@ -289,8 +290,7 @@ async def send_audio(q, audio_bytes): | |
await q.put.aio(END_OF_STREAM, partition="audio") | ||
|
||
|
||
# `receive_transcriptions` is straightforward. | ||
# It just waits for a transcription and prints it after a small delay to avoid colliding with the print statements | ||
# `receive_transcriptions` waits for a transcription and prints it after a small delay to avoid colliding with the print statements | ||
# from the GPU container. | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the callout to it being a single app?
For me the browser window does not pop up automatically... I have to click the link in the terminal. Does it automatically open for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.