feat: add per-request model selection#11
Conversation
Allow API clients to specify which Whisper model to use per request via the new 'model' form parameter on POST /v1/transcript. When omitted (or empty string), the server default from settings is used (backward compatible). When specified, ModelManager loads and caches the requested model separately from the default. Changes: - server.py: Add 'model' Form parameter to submit_transcript() - transcriber.py: Add 'model' field to TranscribeOptions dataclass - model_manager.py: Accept model_name in get_model() and _get_custom_model(), include model in cache key hash - test_server.py: Add tests for model param and empty-string fallback Supported model names: tiny, base, small, medium, large-v1, large-v2, large-v3, large-v3-turbo (any name accepted by faster-whisper).
Summary of ChangesHello @Odrec, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the flexibility of the transcription service by enabling per-request Whisper model selection. Previously, the model was a server-wide setting, limiting client applications that needed to offer diverse model choices (e.g., faster models for previews, more accurate ones for final output). The changes introduce a new API parameter and update the model management logic to dynamically load and cache different models based on client requests, while maintaining full backward compatibility. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature for per-request model selection, correctly propagating the new model parameter from the API endpoint down to the model loading logic. However, a critical path traversal vulnerability was identified: the model parameter on the /v1/transcript endpoint is not validated, allowing an attacker to specify arbitrary file paths, which could lead to unauthorized file access and potentially remote code execution. Remediation details for implementing an allow-list for model names are provided in a code comment. Additionally, while the caching changes for the model name in the hash key are correctly implemented, the new tests should be enhanced to verify that the correct model is being used by the background task, not just that the API endpoint accepts the parameter.
- Add ALLOWED_MODELS allow-list to prevent path traversal attacks via the model parameter (security-critical fix) - Validate model name against allow-list before processing, return 400 with clear error message for invalid model names - Improve tests: mock process_transcription to verify model value is correctly propagated to TranscribeOptions - Add test for path traversal rejection (../../etc/passwd)
Summary
Allow API clients to specify which Whisper model to use per request via the new
modelform parameter onPOST /v1/transcript.Problem
Currently, the model is set server-wide via
MURMURAI_MODEL(defaults tolarge-v3-turbo). Clients have no way to request a different model per transcription job. This is limiting for applications that want to offer model selection to their users (e.g., faster/lighter models for quick previews, larger models for accuracy).Solution
Add an optional
modelparameter to the transcript submission endpoint:base,small,large-v2):ModelManagerloads and caches the requested model separatelyThe existing cache system (up to 3 custom model configs) naturally handles model variants. The cache key now includes the model name, so
basewith default options andlarge-v3-turbowith default options are cached independently.Changes
server.pymodelForm parameter tosubmit_transcript(), sanitize empty string toNone, pass toTranscribeOptionstranscriber.pymodel_manager.pymodel_nameparam inget_model()and_get_custom_model(), include model in cache key hash, use requested model name inload_model()calltest_server.pyAPI Usage