- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 223
 
Open
Description
First thanks so much for your great work!
I found the stt command support output arg:
python -m mlx_audio.stt.generate --help
usage: generate.py [-h] --model MODEL --audio AUDIO --output OUTPUT [--format {txt,srt,vtt,json}] [--verbose]
                   [--max_tokens MAX_TOKENS]
Generate transcriptions from audio files
options:
  -h, --help            show this help message and exit
  --model MODEL         Path to the model
  --audio AUDIO         Path to the audio file
  --output OUTPUT       Path to save the output
  --format {txt,srt,vtt,json}
                        Output format (txt, srt, vtt, or json)
  --verbose             Verbose output
  --max_tokens MAX_TOKENS
                        Maximum number of new tokens to generate
But the tts command DOES NOT support:
python -m mlx_audio.tts.generate --help
usage: generate.py [-h] [--model MODEL] [--max_tokens MAX_TOKENS] [--text TEXT] [--voice VOICE] [--speed SPEED]
                   [--gender GENDER] [--pitch PITCH] [--lang_code LANG_CODE] [--file_prefix FILE_PREFIX] [--verbose]
                   [--join_audio] [--play] [--audio_format AUDIO_FORMAT] [--ref_audio REF_AUDIO] [--ref_text REF_TEXT]
                   [--stt_model STT_MODEL] [--temperature TEMPERATURE] [--top_p TOP_P] [--top_k TOP_K]
                   [--repetition_penalty REPETITION_PENALTY] [--stream] [--streaming_interval STREAMING_INTERVAL]
Generate audio from text using TTS.
options:
  -h, --help            show this help message and exit
  --model MODEL         Path or repo id of the model
  --max_tokens MAX_TOKENS
                        Maximum number of tokens to generate
  --text TEXT           Text to generate (leave blank to input via stdin)
  --voice VOICE         Voice name
  --speed SPEED         Speed of the audio
  --gender GENDER       Gender of the voice [male, female]
  --pitch PITCH         Pitch of the voice
  --lang_code LANG_CODE
                        Language code
  --file_prefix FILE_PREFIX
                        Output file name prefix
  --verbose             Print verbose output
  --join_audio          Join all audio files into one
  --play                Play the output audio
  --audio_format AUDIO_FORMAT
                        Output audio format
  --ref_audio REF_AUDIO
                        Path to reference audio
  --ref_text REF_TEXT   Caption for reference audio
  --stt_model STT_MODEL
                        STT model to use to transcribe reference audio
  --temperature TEMPERATURE
                        Temperature for the model
  --top_p TOP_P         Top-p for the model
  --top_k TOP_K         Top-k for the model
  --repetition_penalty REPETITION_PENALTY
                        Repetition penalty for the model
  --stream              Stream the audio as segments instead of saving to a file
  --streaming_interval STREAMING_INTERVAL
                        The time interval in seconds for streaming segments
Could tts still support the output args too? I think it's a so good job If it could.
Maybe there's some tech reason NOT to implement it?
Metadata
Metadata
Assignees
Labels
No labels