Skip to content

Latest commit

 

History

History
175 lines (120 loc) · 4.54 KB

File metadata and controls

175 lines (120 loc) · 4.54 KB

Audio Transcription Fix Summary

Date: February 23, 2026

Issues Fixed

1. Import Error in backend/input/init.py

Problem: The module was trying to import MathOCR first, which depends on google-cloud-vision. If vision was not installed, the entire input module would fail to load, including ASR.

Solution:

  • Made both OCR and ASR imports optional with try-except blocks
  • Changed import order to import MathNormalizer first (has no dependencies)
  • Dynamically build __all__ based on what's available

Files Changed:


2. Google Cloud Speech API Configuration

Problem: The ASR module was using incorrect API call patterns:

  • auto_decoding_config={} instead of proper object instantiation
  • Location was set to us-central1 instead of global for default recognizer
  • Recognizer ID was hardcoded to chirp-2-recognizer instead of using default _

Solution:

  • Fixed auto_decoding_config to use cloud_speech.AutoDetectDecodingConfig()
  • Updated default STT_LOCATION to global for compatibility
  • Updated default STT_RECOGNIZER to _ (default recognizer)
  • Added proper error handling for empty audio

Files Changed:


3. Documentation Updates

Problem: Setup instructions were incomplete and didn't document audio configuration.

Solution:

  • Updated .env.example with correct audio settings and comments
  • Enhanced README.md with detailed setup instructions for audio
  • Clarified that audio is optional and requires GCP credentials
  • Added information about using uv package manager

Files Changed:


4. Testing Infrastructure

Problem: No comprehensive test for the complete audio pipeline.

Solution:

  • Created new comprehensive test suite: tests/test_audio_pipeline.py
  • Tests cover:
    • ASR initialization
    • Math phrase normalization
    • Error handling for invalid inputs
  • All tests passing ✅

Files Created:


Components Verified

ASR Initialization - Google Cloud Speech v2 client initializes correctly
Math Normalizer - Spoken math phrases convert to symbolic notation
Error Handling - Gracefully handles empty/invalid audio
Frontend Integration - Streamlit helper functions work with fixed ASR
No Import Errors - Module loads correctly even without google-cloud packages


How to Use Audio Transcription

1. Setup (One-time)

  1. Install google-cloud-speech (already in requirements.txt):

    uv sync
  2. Set up Google Cloud credentials:

    • Create a GCP project
    • Enable Speech-to-Text API
    • Download service account credentials JSON
    • Add to .env:
    GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
    GCP_PROJECT_ID=your_project_id
    STT_LOCATION=global
    STT_RECOGNIZER=_

2. Use in App

  1. Run: streamlit run frontend/app.py
  2. Click "🎤 Audio" tab
  3. Record your math problem
  4. Click "📝 Transcribe Audio"
  5. Verify/edit transcription
  6. Click "✅ Solve Transcribed Problem"

Technical Details

API Used

  • Google Cloud Speech-to-Text V2 (Chirp 2 model)
  • Version: google-cloud-speech==2.36.1
  • Why: State-of-the-art accuracy for technical terms

Audio Flow

User Recording (WebM/Opus from Streamlit)
  ↓
MathASR.transcribe(audio_bytes)
  ↓
Google Cloud Speech API (auto-detect encoding)
  ↓
Raw transcription text
  ↓
MathNormalizer.normalize(text)
  ↓
Cleaned math expression
  ↓
Orchestrator (Parser → Solver → etc.)

Key Files

  • backend/input/asr.py - Speech-to-Text integration
  • backend/input/normalizer.py - Math phrase normalization
  • frontend/helper_inputs.py - Streamlit UI for audio
  • backend/config.py - Configuration management

Future Improvements

  • Add support for multiple languages
  • Implement streaming transcription for real-time feedback
  • Add audio quality checks before transcription
  • Cache transcriptions to avoid duplicate API calls
  • Add confidence-based auto-retry logic

Notes

  • Audio input is optional - app works fine without GCP credentials
  • If credentials are missing, audio tab shows graceful error message
  • All packages properly installed in venv at .venv/
  • Tests use correct Python executable from venv