Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
bf811ba
refactor(asr): move mic to constructor and remove AudioStreamRouter
robgee86 Apr 22, 2026
f3175aa
fix(asr): refactor
robgee86 Apr 22, 2026
2329028
feat(asr): export ASR new errors
robgee86 Apr 22, 2026
310ec26
docs(asr): update examples for new constructor-based source API
robgee86 Apr 22, 2026
57e7aee
docs(asr): update README
robgee86 Apr 22, 2026
07ba1b8
fix(asr): log a warning on unknown WebSocket message types
robgee86 Apr 22, 2026
0007562
docs(asr): adapt docstrings to Google style
robgee86 Apr 23, 2026
94527d5
TEMP: set asr-backpressure-and-buffering image tag
robgee86 Apr 17, 2026
2741093
feat(asr): add ASRUnavailableError for container unavailable scenarios
robgee86 Apr 23, 2026
04bdc29
refactor(asr): better docstrings
robgee86 Apr 23, 2026
02b6552
TEMP: fix CI
robgee86 Apr 23, 2026
91e2e78
feat(asr): derive model name from brick_config.yaml or app.yaml
robgee86 Apr 23, 2026
9024b69
refactor(asr): change default model name to whisper-small-quantized
robgee86 Apr 23, 2026
04323c0
fix(asr): missing var
robgee86 Apr 23, 2026
632e58d
fix(ci): fix cleanup failure when no buildcache exists
robgee86 Apr 23, 2026
c6e7c64
refactor(asr): reduce error log
robgee86 Apr 23, 2026
3f18a4c
feat(asr): tune WebSocket for faster container crash detection
robgee86 Apr 23, 2026
79573e9
fix(asr): prevent reader thread leak container crash
robgee86 Apr 23, 2026
bd54f41
fix(asr): avoid deadlock when cancel() is called from the iterator th…
robgee86 Apr 23, 2026
f9496ee
refactor(asr)
robgee86 Apr 23, 2026
0dd8748
refactor(asr): better logging
robgee86 Apr 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cleanup-cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
VERSION_ID=$(curl -s -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" \
-H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/orgs/${{ github.repository_owner }}/packages/container/${PACKAGE_NAME}/versions" \
| jq -r ".[] | select(.metadata.container.tags[] == \"${CACHE_TAG}\") | .id")
| jq -r --arg tag "${CACHE_TAG}" '.[]? | select((.metadata.container.tags? // []) | index($tag) != null) | .id')

if [ -n "$VERSION_ID" ] && [ "$VERSION_ID" != "null" ]; then
echo "Found cache version ID: $VERSION_ID. Deleting..."
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,7 @@ jobs:
if: matrix.container == 'python-apps-base'
env:
PUBLIC_IMAGE_REGISTRY_BASE: ${{ env.DEV_REGISTRY_PATH }}
DEV_TAG_VERSION: ${{ needs.detect-changes.outputs.image_tag }}
run: |
pip install go-task-bin==${{ env.TASKFILE_VERSION }}
task init:ci
Expand Down
6 changes: 3 additions & 3 deletions models/models-list.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1265,11 +1265,11 @@ models:
source: "qualcomm-ai-hub"
source-model-id: "mediapipe_hand_gesture"
source-model-url: "https://aihub.qualcomm.com/models/mediapipe_hand_gesture"
- whisper-small:
- whisper-small-quantized:
runner: audio-analytics
name : "Whisper Small"
name : "Whisper Small (quantized)"
description: >-
Whisper-Small ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text.
Whisper ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text.
This model is based on the transformer architecture and has been optimized for edge inference by replacing Multi-Head Attention (MHA)
with Single-Head Attention (SHA) and linear layers with convolutional (conv) layers. It exhibits robust performance in realistic, noisy environments,
making it highly reliable for real-world applications.
Expand Down
23 changes: 18 additions & 5 deletions src/arduino/app_bricks/asr/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,23 @@
# Automatic Speech Recognition Brick

The `AutomaticSpeechRecognition` brick provides on-device automatic speech recognition (ASR) capabilities for audio streams and files. It offers a high-level interface for transcribing audio using a local model, with support for both real-time and batch processing.
The `AutomaticSpeechRecognition` brick provides on-device automatic speech recognition (ASR) capabilities for audio streams and files. It offers a high-level interface for transcribing audio using a local model, with support for both real-time microphone capture and in-memory audio (WAV bytes or raw PCM arrays).

## LocalASR Class Features
## Features

- **Offline Operation:** All transcriptions are performed locally, ensuring data privacy and eliminating network dependencies.
- **Multi Language Support:** Supports the transcription of spoken multiple languages.
- **Audio Input Formats**: Designed to work with the Microphone peripheral, WAV and PCM audio.
- **Concurrency Control**: Limits the number of simultaneous transcription sessions to avoid resource exhaustion.
- **Multi-Language Support:** Supports the transcription of multiple spoken languages.
- **Flexible Audio Input:** The constructor accepts a `BaseMicrophone` instance, a `bytes` WAV container, a raw `np.ndarray` of PCM samples, or `None` to use a default `Microphone()`.
- **Single-Session Semantics:** Each instance handles one transcription session at a time. For concurrent transcriptions on different microphones, create multiple `AutomaticSpeechRecognition` instances.

## Errors

- `ASRBusyError`: raised if you call `transcribe()` / `transcribe_stream()` while the instance already has an active session. Fix by awaiting the current session or using a separate instance.
- `ASRServiceBusyError`: raised when the inference server rejects session creation because it is currently serving another client. The caller decides whether to retry.
- `ASRUnavailableError`: raised when the inference service is unreachable (container down, network error) or the WebSocket connection drops mid-session. The caller decides whether to retry.
- `ASRError`: base class for all of the above.

## Source Ownership

- When `source` is `None`, ASR constructs a default `Microphone()` and manages its lifecycle through `asr.start()` / `asr.stop()`.
- When `source` is a `BaseMicrophone` you pass in, **you** own its lifecycle — call `mic.start()` before transcribing and `mic.stop()` when done.
- In-memory sources (`bytes`, `np.ndarray`) have no device lifecycle.
20 changes: 18 additions & 2 deletions src/arduino/app_bricks/asr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@
#
# SPDX-License-Identifier: MPL-2.0

from .local_asr import AutomaticSpeechRecognition, ASREvent, TranscriptionStream
from .local_asr import (
ASREvent,
ASRBusyError,
ASRError,
ASRServiceBusyError,
ASRUnavailableError,
AutomaticSpeechRecognition,
TranscriptionStream,
)

__all__ = ["AutomaticSpeechRecognition", "ASREvent", "TranscriptionStream"]
__all__ = [
"ASREvent",
"ASRError",
"ASRBusyError",
"ASRServiceBusyError",
"ASRUnavailableError",
"AutomaticSpeechRecognition",
"TranscriptionStream",
]
2 changes: 1 addition & 1 deletion src/arduino/app_bricks/asr/brick_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Automatic Speech Recognition brick for offline speech-to-text proce
category: audio
requires_model: true
requires_services: ["arduino:genie_audio"]
model: whisper-small
model: whisper-small-quantized
supported_boards: ["ventunoq"]
model_configuration_variables:
- VAD_LEN_HANGOVER
Expand Down
4 changes: 2 additions & 2 deletions src/arduino/app_bricks/asr/examples/00_transcribe_wav.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from arduino.app_bricks.asr import AutomaticSpeechRecognition


asr = AutomaticSpeechRecognition()
with open("recording_01.wav", "rb") as wav_file:
text = asr.transcribe_wav(wav_file.read())
asr = AutomaticSpeechRecognition(wav_file.read())
text = asr.transcribe()
print(f"Transcription: {text}")
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
from arduino.app_bricks.asr import AutomaticSpeechRecognition


asr = AutomaticSpeechRecognition()
with open("recording_01.wav", "rb") as wav_file:
with asr.transcribe_wav_stream(wav_file.read()) as stream:
asr = AutomaticSpeechRecognition(wav_file.read())
with asr.transcribe_stream() as stream:
for chunk in stream:
match chunk.type:
case "partial_text":
Expand Down
4 changes: 2 additions & 2 deletions src/arduino/app_bricks/asr/examples/02_transcribe_mic.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
mic = Microphone()
mic.start()

asr = AutomaticSpeechRecognition()
text = asr.transcribe_mic(mic, duration=5)
asr = AutomaticSpeechRecognition(mic)
text = asr.transcribe(duration=5)
print(f"Transcription: {text}")

mic.stop()
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
mic = Microphone()
mic.start()

asr = AutomaticSpeechRecognition()
with asr.transcribe_mic_stream(mic, duration=5) as stream:
asr = AutomaticSpeechRecognition(mic)
with asr.transcribe_stream(duration=5) as stream:
for chunk in stream:
match chunk.type:
case "partial_text":
Expand Down
Loading
Loading