How can the R&D team and PM implement a command-based audio-to-audio workflow?
For example, scenario: After each sentence of the original Chinese input speech ends, translate it into an English version as required by the command (e.g., with expanded explanation) and generate the corresponding WAV file.
Scenario labeling logic: Scene_Language -> Scene_Language + Scene_Target Language + Scene_Target Language Command Explanation
请问研发团队和 PM 怎么实现 带指令的 audio to audio 工作流?
比如场景:中文输入的原文语音,每句话结束之后,按照指令要求翻译成英文版本(比如展开解释),生成对应的 wav 文件。
场景标注逻辑:场景_语言 -> 场景_语言 + 场景_转换语言 + 场景_转换语言指令解释
英文翻译