Problem: The module was trying to import MathOCR first, which depends on google-cloud-vision. If vision was not installed, the entire input module would fail to load, including ASR.
Solution:
- Made both OCR and ASR imports optional with try-except blocks
- Changed import order to import
MathNormalizerfirst (has no dependencies) - Dynamically build
__all__based on what's available
Files Changed:
Problem: The ASR module was using incorrect API call patterns:
auto_decoding_config={}instead of proper object instantiation- Location was set to
us-central1instead ofglobalfor default recognizer - Recognizer ID was hardcoded to
chirp-2-recognizerinstead of using default_
Solution:
- Fixed
auto_decoding_configto usecloud_speech.AutoDetectDecodingConfig() - Updated default
STT_LOCATIONtoglobalfor compatibility - Updated default
STT_RECOGNIZERto_(default recognizer) - Added proper error handling for empty audio
Files Changed:
- backend/input/asr.py - Line 67
- backend/config.py - Lines 40-42
Problem: Setup instructions were incomplete and didn't document audio configuration.
Solution:
- Updated
.env.examplewith correct audio settings and comments - Enhanced
README.mdwith detailed setup instructions for audio - Clarified that audio is optional and requires GCP credentials
- Added information about using
uvpackage manager
Files Changed:
- .env.example
- README.md - Lines 100-140
Problem: No comprehensive test for the complete audio pipeline.
Solution:
- Created new comprehensive test suite:
tests/test_audio_pipeline.py - Tests cover:
- ASR initialization
- Math phrase normalization
- Error handling for invalid inputs
- All tests passing ✅
Files Created:
✅ ASR Initialization - Google Cloud Speech v2 client initializes correctly
✅ Math Normalizer - Spoken math phrases convert to symbolic notation
✅ Error Handling - Gracefully handles empty/invalid audio
✅ Frontend Integration - Streamlit helper functions work with fixed ASR
✅ No Import Errors - Module loads correctly even without google-cloud packages
-
Install google-cloud-speech (already in requirements.txt):
uv sync
-
Set up Google Cloud credentials:
- Create a GCP project
- Enable Speech-to-Text API
- Download service account credentials JSON
- Add to
.env:
GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json GCP_PROJECT_ID=your_project_id STT_LOCATION=global STT_RECOGNIZER=_
- Run:
streamlit run frontend/app.py - Click "🎤 Audio" tab
- Record your math problem
- Click "📝 Transcribe Audio"
- Verify/edit transcription
- Click "✅ Solve Transcribed Problem"
- Google Cloud Speech-to-Text V2 (Chirp 2 model)
- Version:
google-cloud-speech==2.36.1 - Why: State-of-the-art accuracy for technical terms
User Recording (WebM/Opus from Streamlit)
↓
MathASR.transcribe(audio_bytes)
↓
Google Cloud Speech API (auto-detect encoding)
↓
Raw transcription text
↓
MathNormalizer.normalize(text)
↓
Cleaned math expression
↓
Orchestrator (Parser → Solver → etc.)
backend/input/asr.py- Speech-to-Text integrationbackend/input/normalizer.py- Math phrase normalizationfrontend/helper_inputs.py- Streamlit UI for audiobackend/config.py- Configuration management
- Add support for multiple languages
- Implement streaming transcription for real-time feedback
- Add audio quality checks before transcription
- Cache transcriptions to avoid duplicate API calls
- Add confidence-based auto-retry logic
- Audio input is optional - app works fine without GCP credentials
- If credentials are missing, audio tab shows graceful error message
- All packages properly installed in venv at
.venv/ - Tests use correct Python executable from venv