update:add a method to access google cloud voice api#2599
update:add a method to access google cloud voice api#2599thzjy wants to merge 1 commit intozhayujie:masterfrom
Conversation
|
@MonkeyCode-AI 帮忙 review |
|
MonkeyCode-AI 正在分析任务... |
MonkeyCode-AI
left a comment
There was a problem hiding this comment.
我是 MonkeyCode AI 编程助手,你可以在 GitHub 仓库的 PR 中 at @MonkeyCode-AI 来呼唤我。
任务执行细节请参考: https://monkeycode-ai.com
代码审查结果
新增 Google Cloud STT/TTS(含长文本)实现思路可行,但当前实现存在凭据管理、安全合规、临时文件清理以及长音频 API 参数/文件格式处理等关键风险,建议合入前修正。
| 🚨 Critical | 💡 Suggestion | |
|---|---|---|
| 3 | 6 | 2 |
| @@ -0,0 +1 @@ | |||
| 把你的google密钥文件替换成本文件。 | |||
There was a problem hiding this comment.
Caution
🚨 将凭据文件纳入仓库的风险(即便是占位说明也容易导致误提交真实密钥)
PR 新增了 google-credentials.json 文件(当前内容为提示语),会强烈暗示用户在该路径放置真实服务账号密钥文件,后续极易被误提交到仓库;同时代码在运行时强绑定该相对路径,进一步提高把真实密钥放进项目目录的概率,属于严重安全/合规风险。
建议: 删除仓库中的 credentials 占位文件;改为读取环境变量(GOOGLE_APPLICATION_CREDENTIALS)或由部署环境通过挂载/secret 注入;并在 .gitignore 中忽略 voice/google/google-credentials.json。
| # 设置 Google Cloud 凭据和配置文件路径 | ||
| cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json") | ||
| config_path = os.path.join(os.path.dirname(__file__), "config.json") | ||
| os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path | ||
|
|
There was a problem hiding this comment.
Caution
🚨 在模块导入时强制设置 GOOGLE_APPLICATION_CREDENTIALS,破坏运行环境并可能引入安全问题
import 时执行 os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path 属于全局副作用:会覆盖宿主环境已配置的凭据、影响同进程其他 Google SDK 客户端、强耦合仓库结构且增加误用本地文件凭据风险。
建议: 不要在导入时写环境变量。优先由部署注入 GOOGLE_APPLICATION_CREDENTIALS,或在初始化时显式读取配置并传入 credentials。至少做到:仅在环境变量未设置且本地文件存在时回退,并在文件不存在时给出明确错误。
| # 设置 Google Cloud 凭据和配置文件路径 | |
| cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json") | |
| config_path = os.path.join(os.path.dirname(__file__), "config.json") | |
| os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path | |
| import os | |
| import time | |
| import uuid | |
| import json | |
| from google.cloud import speech | |
| from google.cloud import texttospeech_v1 as texttospeech | |
| from google.cloud import storage | |
| from google.api_core.exceptions import GoogleAPIError | |
| from pydub import AudioSegment | |
| from bridge.reply import Reply, ReplyType | |
| from common.log import logger | |
| from common.tmp_dir import TmpDir | |
| from voice.voice import Voice | |
| from common.utils import remove_markdown_symbol | |
| # 设置 Google Cloud 凭据和配置文件路径 | |
| cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json") | |
| config_path = os.path.join(os.path.dirname(__file__), "config.json") | |
| # 不要覆盖外部已配置的凭据;仅在未设置且本地文件存在时回退 | |
| if "GOOGLE_APPLICATION_CREDENTIALS" not in os.environ and os.path.exists(cred_path): | |
| os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path |
| # 从 google-credentials.json 获取 project_id | ||
| try: | ||
| with open(cred_path, 'r') as f: | ||
| credentials = json.load(f) | ||
| self.project_id = credentials.get('project_id') | ||
| if not self.project_id: | ||
| raise ValueError("project_id 未在 google-credentials.json 中找到") | ||
| logger.debug(f"从 JSON 获取 project_id: {self.project_id}") | ||
| except Exception as e: | ||
| logger.error(f"无法读取 project_id: {e}") | ||
| raise | ||
| # 从 config.json 获取 bucket_name | ||
| try: | ||
| with open(config_path, 'r') as f: | ||
| config = json.load(f) | ||
| self.bucket_name = config.get('gcs_bucket_name') | ||
| if not self.bucket_name: | ||
| raise ValueError("gcs_bucket_name 未在 config.json 中找到") | ||
| logger.debug(f"从 config.json 获取 bucket_name: {self.bucket_name}") | ||
| except Exception as e: | ||
| logger.error(f"无法读取 config.json: {e}") | ||
| raise |
There was a problem hiding this comment.
Warning
代码从本地 cred_path 读取 project_id,但实际生效凭据可能来自环境变量/ADC,此时 project_id 可能不一致甚至文件不存在。通常可由客户端/默认配置解析 project,不建议手工读 JSON。
建议: 如需 project_id,优先从显式配置/环境变量(如 GOOGLE_CLOUD_PROJECT)提供,或使用客户端的默认 project 推断机制;避免强依赖 cred_path 文件存在。
| def convert_audio_to_wav(self, input_file_path, output_file_path="temp_audio.wav"): | ||
| """ | ||
| 将 AMR 或 MP3 文件转换为 WAV 格式 | ||
| 参数: | ||
| input_file_path: 输入音频文件路径(AMR 或 MP3) | ||
| output_file_path: 输出 WAV 文件路径 | ||
| 返回: | ||
| 转换后的 WAV 文件路径及其采样率 | ||
| """ | ||
| try: | ||
| audio = AudioSegment.from_file(input_file_path) | ||
| sample_rate = audio.frame_rate | ||
| duration_ms = len(audio) | ||
| logger.debug(f"输入音频: {input_file_path}, 采样率: {sample_rate}Hz, 时长: {duration_ms/1000}s") | ||
| if duration_ms < 100: | ||
| logger.error("音频文件过短,无法处理") | ||
| return None, None | ||
| audio = audio.set_channels(1).set_sample_width(2) | ||
| audio.export(output_file_path, format="wav", codec="pcm_s16le") | ||
| return output_file_path, sample_rate |
There was a problem hiding this comment.
Warning
convert_audio_to_wav 默认 output_file_path="temp_audio.wav",并发调用会互相覆盖。尽管当前调用处传入 uuid 文件名,但该工具函数仍可能被误用。
建议: 移除固定默认文件名或默认使用 uuid 临时文件;建议把临时文件放到 TmpDir() 管理目录。
| file_ext = os.path.splitext(voice_file)[1].lower() | ||
| if file_ext in [".amr", ".mp3"]: | ||
| temp_wav_file = f"temp_audio_{uuid.uuid4().hex}.wav" | ||
| voice_file, sample_rate = self.convert_audio_to_wav(voice_file, temp_wav_file) | ||
| if not voice_file: | ||
| logger.error("音频转换失败") | ||
| return Reply(ReplyType.ERROR, "音频转换失败") | ||
| elif file_ext == ".wav": | ||
| audio = AudioSegment.from_wav(voice_file) | ||
| sample_rate = audio.frame_rate | ||
| duration_ms = len(audio) | ||
| logger.debug(f"WAV 音频: {voice_file}, 采样率: {sample_rate}Hz, 时长: {duration_ms/1000}s") | ||
| if duration_ms < 100: | ||
| logger.error("音频文件过短,无法处理") | ||
| return Reply(ReplyType.ERROR, "音频文件过短,无法处理") | ||
| else: | ||
| logger.error("不支持的音频格式,仅支持 AMR、MP3 和 WAV") | ||
| return Reply(ReplyType.ERROR, "不支持的音频格式,仅支持 AMR、MP3 和 WAV") | ||
|
|
||
| with open(voice_file, "rb") as audio_file: | ||
| audio_content = audio_file.read() | ||
|
|
||
| audio = speech.RecognitionAudio(content=audio_content) | ||
| config = speech.RecognitionConfig( | ||
| encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16, | ||
| sample_rate_hertz=sample_rate, | ||
| language_code="cmn-CN", | ||
| ) | ||
|
|
||
| response = self.speech_client.recognize(config=config, audio=audio) | ||
|
|
||
| transcript = "" | ||
| for result in response.results: | ||
| transcript += result.alternatives[0].transcript + " " | ||
|
|
||
| transcript = transcript.strip() | ||
| if not transcript: | ||
| logger.error("语音识别失败:无法理解音频内容") | ||
| return Reply(ReplyType.ERROR, "抱歉,我听不懂") | ||
|
|
||
| logger.info(f"[Google] voiceToText text={transcript} voice file name={voice_file}") | ||
| reply = Reply(ReplyType.TEXT, transcript) | ||
|
|
||
| if file_ext in [".amr", ".mp3"] and os.path.exists(voice_file): | ||
| os.remove(voice_file) | ||
|
|
||
| return reply |
There was a problem hiding this comment.
Warning
输入为 amr/mp3 时会将 voice_file 覆盖为转换后的 wav,随后按 file_ext 判断删除。异常时临时 wav 可能不会被删除;变量复用降低可读性且易引入误删 bug;未使用 TmpDir() 导致临时文件散落。
建议: 用独立变量保存 temp_wav_file,并在 finally 中清理;将临时文件放入 TmpDir() 目录下统一管理。
| language_code="cmn-CN", | ||
| ) | ||
|
|
||
| response = self.speech_client.recognize(config=config, audio=audio) |
There was a problem hiding this comment.
Warning
speech_client.recognize 为同步接口,适用于短音频;长音频通常需要 long_running_recognize(尤其是经 GCS)。当前未处理长音频路径。
建议: 按时长/大小选择 recognize vs long_running_recognize;长音频先上传到 GCS 再识别,并加入超时/错误处理。
| request = texttospeech.SynthesizeLongAudioRequest( | ||
| parent=parent, | ||
| input=synthesis_input, | ||
| audio_config=audio_config, | ||
| voice=voice, | ||
| output_gcs_uri=output_gcs_uri, | ||
| ) | ||
| operation = self.tts_long_client.synthesize_long_audio(request=request) | ||
| result = operation.result(timeout=600) # 等待长音频合成完成(最大 10 分钟) |
There was a problem hiding this comment.
Caution
🚨 Long Audio TTS 请求参数可能使用了错误的 Request 类型/字段名,且 result 未使用
使用 texttospeech.SynthesizeLongAudioRequest 并传 request=... 调用 synthesize_long_audio,不同版本 SDK 的消息类型/字段可能差异;当前未校验 operation 成功与否,result 也未使用。timeout 固定 600s 可能不足。
建议: 以项目依赖版本为准校验 long audio API 的正确用法,补充 operation 状态/异常检查(exception()/done),并将 timeout 配置化与提供明确超时错误信息。
| gcs_output_path = f"output-{unique_id}.wav" # Long Audio 使用 WAV | ||
|
|
||
| # 配置语音参数(中文普通话) | ||
| voice = texttospeech.VoiceSelectionParams( | ||
| language_code="cmn-CN", | ||
| name="cmn-CN-Wavenet-A", | ||
| ) | ||
|
|
||
| if byte_length <= 5000: | ||
| # 使用标准 Text-to-Speech API(短文本,输出 MP3) | ||
| audio_config = texttospeech.AudioConfig( | ||
| audio_encoding=texttospeech.AudioEncoding.MP3 | ||
| ) | ||
| synthesis_input = texttospeech.SynthesisInput(text=cleaned_text) | ||
| response = self.tts_client.synthesize_speech( | ||
| input=synthesis_input, voice=voice, audio_config=audio_config | ||
| ) | ||
| with open(mp3_file, "wb") as out: | ||
| out.write(response.audio_content) | ||
| logger.info(f"[Google] textToVoice (standard) text={cleaned_text[:50]}... voice file name={mp3_file}") | ||
| return Reply(ReplyType.VOICE, mp3_file) | ||
| else: | ||
| # 使用 Long Audio API(长文本,输出 LINEAR16/WAV) | ||
| audio_config = texttospeech.AudioConfig( | ||
| audio_encoding=texttospeech.AudioEncoding.LINEAR16 | ||
| ) | ||
| parent = f"projects/{self.project_id}/locations/global" | ||
| synthesis_input = texttospeech.SynthesisInput(text=cleaned_text) | ||
| output_gcs_uri = f"gs://{self.bucket_name}/{gcs_output_path}" | ||
| request = texttospeech.SynthesizeLongAudioRequest( | ||
| parent=parent, | ||
| input=synthesis_input, | ||
| audio_config=audio_config, | ||
| voice=voice, | ||
| output_gcs_uri=output_gcs_uri, | ||
| ) | ||
| operation = self.tts_long_client.synthesize_long_audio(request=request) | ||
| result = operation.result(timeout=600) # 等待长音频合成完成(最大 10 分钟) | ||
|
|
||
| # 从 GCS 下载 WAV 文件 | ||
| temp_wav_file = f"{TmpDir().path()}temp_wav_{unique_id}.wav" | ||
| bucket = self.storage_client.bucket(self.bucket_name) | ||
| blob = bucket.blob(gcs_output_path) | ||
| blob.download_to_filename(temp_wav_file) | ||
| logger.debug(f"从 GCS 下载 WAV 文件: {temp_wav_file}") | ||
|
|
||
| # 转换为 MP3 | ||
| audio = AudioSegment.from_wav(temp_wav_file) | ||
| audio.export(mp3_file, format="mp3") | ||
| logger.info(f"[Google] textToVoice (long audio) text={cleaned_text[:50]}... voice file name={mp3_file}") | ||
|
|
||
| # 清理临时文件 | ||
| os.remove(temp_wav_file) | ||
| blob.delete() | ||
|
|
||
| return Reply(ReplyType.VOICE, mp3_file) |
There was a problem hiding this comment.
Warning
AudioSegment.export(mp3) 未指定 bitrate 等参数导致质量/体积不可控;下载 wav、转码、删除本地文件、删除 GCS blob 任一步失败都可能留下本地临时文件或 GCS 对象(产生费用/泄漏)。
建议: 用 try/finally 确保本地临时文件与 GCS blob 清理;转码显式设置参数(如 bitrate="128k"),并记录/输出 blob URI 便于排障。
| @@ -0,0 +1,3 @@ | |||
| { | |||
There was a problem hiding this comment.
Tip
💡 配置文件不应硬编码在仓库中,且建议提供示例文件而不是生效文件
config.json 写死 bucket name,占位值易误用。更推荐提供 config.example.json 并将真实 config.json 加入 .gitignore,或统一使用现有配置系统/环境变量。
建议: 改为 config.example.json + 文档说明;运行时从环境变量读取 bucket name。
| """ | ||
| 语言代码: yue-HK | ||
| 名称: yue-HK-Standard-A, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: yue-HK-Standard-B, 性别: MALE, 采样率: 24000Hz | ||
| 名称: yue-HK-Standard-C, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: yue-HK-Standard-D, 性别: MALE, 采样率: 24000Hz | ||
|
|
||
| 语言代码: cmn-CN | ||
| 名称: cmn-CN-Chirp3-HD-Achernar, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Achird, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Algenib, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Algieba, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Alnilam, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Aoede, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Autonoe, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Callirrhoe, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Charon, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Despina, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Enceladus, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Erinome, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Fenrir, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Gacrux, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Iapetus, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Kore, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Laomedeia, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Leda, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Orus, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Puck, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Pulcherrima, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Rasalgethi, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Sadachbia, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Sadaltager, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Schedar, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Sulafat, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Umbriel, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Vindemiatrix, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Zephyr, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Chirp3-HD-Zubenelgenubi, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Standard-A, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Standard-B, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Standard-C, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Standard-D, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Wavenet-A, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Wavenet-B, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Wavenet-C, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-CN-Wavenet-D, 性别: FEMALE, 采样率: 24000Hz | ||
|
|
||
| 语言代码: cmn-TW | ||
| 名称: cmn-TW-Standard-A, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-TW-Standard-B, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-TW-Standard-C, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-TW-Wavenet-A, 性别: FEMALE, 采样率: 24000Hz | ||
| 名称: cmn-TW-Wavenet-B, 性别: MALE, 采样率: 24000Hz | ||
| 名称: cmn-TW-Wavenet-C, 性别: MALE, 采样率: 24000Hz | ||
| """ No newline at end of file |
There was a problem hiding this comment.
Tip
💡 大段注释语音列表建议移到文档或常量文件
文件末尾大段三引号注释增加噪音、影响可读性,且提示 No newline at end of file。
建议: 将语音列表移至 README/文档或单独配置;文件末尾补充换行。
TTS增加长文处理,短文5000字节长文1M字节。STT修改自适应输入的语音参数用于语音识别。