Problem
Multiple STT providers hardcode audio/mpeg as the multipart content-type when uploading the audio file, regardless of the file's actual format:
Example pattern:
files = {"file": (filename, audio_file, "audio/mpeg")} # ← always audio/mpeg
If the user passes a .wav, .flac, .m4a, .ogg, or .webm file, the multipart upload will declare it as MP3.
Why it matters
Most servers (OpenAI, Mistral, Groq) inspect the audio content rather than trusting the multipart Content-Type hint, so this hasn't visibly broken anything. But:
- It's a lying header — bad form.
- Some servers (Azure in some configurations, less-mature OpenAI-compatible servers like local Whisper deployments) DO trust the hint and reject mismatched content.
- New STT providers added by following these as templates inherit the bug.
Proposed fix
A small helper in src/esperanto/providers/stt/base.py (or a shared util):
import mimetypes
_DEFAULT_AUDIO_CONTENT_TYPE = "audio/mpeg"
def _guess_audio_content_type(filename: str) -> str:
\"\"\"Guess multipart Content-Type from filename extension.\"\"\"
guessed, _ = mimetypes.guess_type(filename)
if guessed and guessed.startswith("audio/"):
return guessed
return _DEFAULT_AUDIO_CONTENT_TYPE
Then each STT provider's transcribe/atranscribe path becomes:
files = {"file": (filename, audio_file, _guess_audio_content_type(filename))}
Files with unknown extensions or non-audio guesses fall back to audio/mpeg, preserving today's behavior.
Acceptance criteria
- All STT providers in
src/esperanto/providers/stt/ use the helper instead of hardcoding audio/mpeg.
- A
.wav file passed to transcribe() results in ("audio/wav", ...) in the multipart upload.
- An unknown-extension or stream-without-name input still uses
audio/mpeg as a safe default.
- Tests assert the correct Content-Type for at least 2-3 file extensions per affected provider.
- Validator + ruff + mypy all clean.
Origin
Follow-up from PR #143 (issue #142) — called out in the 'Follow-ups' section of that PR's description. Same pattern was identified in the existing OpenAI STT provider during review.
Problem
Multiple STT providers hardcode
audio/mpegas the multipart content-type when uploading the audio file, regardless of the file's actual format:src/esperanto/providers/stt/openai.py— bothtranscribeandatranscribesrc/esperanto/providers/stt/mistral.py— both paths (added in feat: add Mistral STT provider (Voxtral) (closes #142) #143)Example pattern:
If the user passes a
.wav,.flac,.m4a,.ogg, or.webmfile, the multipart upload will declare it as MP3.Why it matters
Most servers (OpenAI, Mistral, Groq) inspect the audio content rather than trusting the multipart Content-Type hint, so this hasn't visibly broken anything. But:
Proposed fix
A small helper in
src/esperanto/providers/stt/base.py(or a shared util):Then each STT provider's transcribe/atranscribe path becomes:
Files with unknown extensions or non-audio guesses fall back to
audio/mpeg, preserving today's behavior.Acceptance criteria
src/esperanto/providers/stt/use the helper instead of hardcodingaudio/mpeg..wavfile passed totranscribe()results in("audio/wav", ...)in the multipart upload.audio/mpegas a safe default.Origin
Follow-up from PR #143 (issue #142) — called out in the 'Follow-ups' section of that PR's description. Same pattern was identified in the existing OpenAI STT provider during review.