Skip to content

Latest commit

 

History

History
68 lines (50 loc) · 3.36 KB

File metadata and controls

68 lines (50 loc) · 3.36 KB

AGENTS.md

Summary

This is the repository reference for short-audio Whisper transcription on OAK4 with LED and stream feedback. Use it when you need the repo’s speech-recognition example rather than a vision pipeline.

Use This Example When

  • You need Whisper Tiny EN on OAK4.
  • You want audio-driven text output integrated with a live camera stream.
  • You need the LED/color feedback workflow described in the README.

Do Not Use This Example When

  • You need RVC2 support.
  • You need continuous streaming ASR.
  • You need a fully host-side speech model rather than device-assisted inference.

Quick Facts

  • Category: neural-networks/speech-recognition/whisper-tiny-en
  • Shape: script+standalone
  • Primary task: short audio capture or audio-file transcription with LED/color feedback
  • Entrypoint: main.py
  • Standalone path: oakapp.toml
  • Frontend: none
  • Runs on: RVC4 only
  • Requires: OAK4, Whisper encoder/decoder model assets, and host audio prerequisites for recording mode
  • Input: host-recorded audio triggered by keypress or --audio_file, plus a live camera stream for visual feedback
  • Output: Camera and Decoded Audio Message
  • Models: Whisper encoder/decoder YAMLs in depthai_models/
  • Visualizer / UI: DepthAI Visualizer via dai.RemoteConnection

Read First

Architecture

  • Host audio is converted into a spectrogram by utils/audio_encoder.py.
  • A device NeuralNetwork runs the Whisper encoder stage.
  • Host/device helper nodes prepare the recursive decoder loop and feed token sequences back into the decoder network.
  • utils/annotation_node.py maps decoded tokens to text and color commands, and utils/led_changer_script.py updates the device LED.
  • The live camera stream is tinted/annotated to match the decoded color word.

Constraints

  • main.py explicitly raises on non-RVC4 platforms.
  • The README notes that host recording crashes in standalone mode; standalone is effectively for pre-recorded audio only.

Related Examples

Validation

  • Run: python3 main.py
  • File mode: python3 main.py --audio_file <FILE>
  • Success looks like: the Visualizer shows Camera and Decoded Audio Message, pressing r records a short clip in peripheral mode, and recognized color words drive the LED/tint output
  • Common failure meaning: the device is not RVC4, host audio dependencies are missing, or standalone recording was attempted