Skip to content

update:add a method to access google cloud voice api#2599

Open
thzjy wants to merge 1 commit intozhayujie:masterfrom
thzjy:update/google-cloud-voice
Open

update:add a method to access google cloud voice api#2599
thzjy wants to merge 1 commit intozhayujie:masterfrom
thzjy:update/google-cloud-voice

Conversation

@thzjy
Copy link
Copy Markdown
Contributor

@thzjy thzjy commented May 29, 2025

TTS增加长文处理,短文5000字节长文1M字节。STT修改自适应输入的语音参数用于语音识别。

@safe1ine
Copy link
Copy Markdown

safe1ine commented Feb 5, 2026

@MonkeyCode-AI 帮忙 review

@MonkeyCode-AI
Copy link
Copy Markdown

MonkeyCode-AI 正在分析任务...

Copy link
Copy Markdown

@MonkeyCode-AI MonkeyCode-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我是 MonkeyCode AI 编程助手,你可以在 GitHub 仓库的 PR 中 at @MonkeyCode-AI 来呼唤我。

任务执行细节请参考: https://monkeycode-ai.com

代码审查结果

新增 Google Cloud STT/TTS(含长文本)实现思路可行,但当前实现存在凭据管理、安全合规、临时文件清理以及长音频 API 参数/文件格式处理等关键风险,建议合入前修正。

🚨 Critical ⚠️ Warning 💡 Suggestion
3 6 2

@@ -0,0 +1 @@
把你的google密钥文件替换成本文件。
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

🚨 将凭据文件纳入仓库的风险(即便是占位说明也容易导致误提交真实密钥)

PR 新增了 google-credentials.json 文件(当前内容为提示语),会强烈暗示用户在该路径放置真实服务账号密钥文件,后续极易被误提交到仓库;同时代码在运行时强绑定该相对路径,进一步提高把真实密钥放进项目目录的概率,属于严重安全/合规风险。

建议: 删除仓库中的 credentials 占位文件;改为读取环境变量(GOOGLE_APPLICATION_CREDENTIALS)或由部署环境通过挂载/secret 注入;并在 .gitignore 中忽略 voice/google/google-credentials.json。

Comment on lines +16 to +20
# 设置 Google Cloud 凭据和配置文件路径
cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")
config_path = os.path.join(os.path.dirname(__file__), "config.json")
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

🚨 在模块导入时强制设置 GOOGLE_APPLICATION_CREDENTIALS,破坏运行环境并可能引入安全问题

import 时执行 os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path 属于全局副作用:会覆盖宿主环境已配置的凭据、影响同进程其他 Google SDK 客户端、强耦合仓库结构且增加误用本地文件凭据风险。

建议: 不要在导入时写环境变量。优先由部署注入 GOOGLE_APPLICATION_CREDENTIALS,或在初始化时显式读取配置并传入 credentials。至少做到:仅在环境变量未设置且本地文件存在时回退,并在文件不存在时给出明确错误。

Suggested change
# 设置 Google Cloud 凭据和配置文件路径
cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")
config_path = os.path.join(os.path.dirname(__file__), "config.json")
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path
import os
import time
import uuid
import json
from google.cloud import speech
from google.cloud import texttospeech_v1 as texttospeech
from google.cloud import storage
from google.api_core.exceptions import GoogleAPIError
from pydub import AudioSegment
from bridge.reply import Reply, ReplyType
from common.log import logger
from common.tmp_dir import TmpDir
from voice.voice import Voice
from common.utils import remove_markdown_symbol
# 设置 Google Cloud 凭据和配置文件路径
cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")
config_path = os.path.join(os.path.dirname(__file__), "config.json")
# 不要覆盖外部已配置的凭据;仅在未设置且本地文件存在时回退
if "GOOGLE_APPLICATION_CREDENTIALS" not in os.environ and os.path.exists(cred_path):
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path

Comment on lines +28 to +49
# 从 google-credentials.json 获取 project_id
try:
with open(cred_path, 'r') as f:
credentials = json.load(f)
self.project_id = credentials.get('project_id')
if not self.project_id:
raise ValueError("project_id 未在 google-credentials.json 中找到")
logger.debug(f"从 JSON 获取 project_id: {self.project_id}")
except Exception as e:
logger.error(f"无法读取 project_id: {e}")
raise
# 从 config.json 获取 bucket_name
try:
with open(config_path, 'r') as f:
config = json.load(f)
self.bucket_name = config.get('gcs_bucket_name')
if not self.bucket_name:
raise ValueError("gcs_bucket_name 未在 config.json 中找到")
logger.debug(f"从 config.json 获取 bucket_name: {self.bucket_name}")
except Exception as e:
logger.error(f"无法读取 config.json: {e}")
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

⚠️ 从 service account JSON 手动读取 project_id 的方式不稳健,且与运行时凭据来源可能不一致

代码从本地 cred_path 读取 project_id,但实际生效凭据可能来自环境变量/ADC,此时 project_id 可能不一致甚至文件不存在。通常可由客户端/默认配置解析 project,不建议手工读 JSON。

建议: 如需 project_id,优先从显式配置/环境变量(如 GOOGLE_CLOUD_PROJECT)提供,或使用客户端的默认 project 推断机制;避免强依赖 cred_path 文件存在。

Comment on lines +51 to +70
def convert_audio_to_wav(self, input_file_path, output_file_path="temp_audio.wav"):
"""
将 AMR 或 MP3 文件转换为 WAV 格式
参数:
input_file_path: 输入音频文件路径(AMR 或 MP3)
output_file_path: 输出 WAV 文件路径
返回:
转换后的 WAV 文件路径及其采样率
"""
try:
audio = AudioSegment.from_file(input_file_path)
sample_rate = audio.frame_rate
duration_ms = len(audio)
logger.debug(f"输入音频: {input_file_path}, 采样率: {sample_rate}Hz, 时长: {duration_ms/1000}s")
if duration_ms < 100:
logger.error("音频文件过短,无法处理")
return None, None
audio = audio.set_channels(1).set_sample_width(2)
audio.export(output_file_path, format="wav", codec="pcm_s16le")
return output_file_path, sample_rate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

⚠️ convert_audio_to_wav 默认输出文件名固定,存在并发覆盖风险

convert_audio_to_wav 默认 output_file_path="temp_audio.wav",并发调用会互相覆盖。尽管当前调用处传入 uuid 文件名,但该工具函数仍可能被误用。

建议: 移除固定默认文件名或默认使用 uuid 临时文件;建议把临时文件放到 TmpDir() 管理目录。

Comment on lines +88 to +134
file_ext = os.path.splitext(voice_file)[1].lower()
if file_ext in [".amr", ".mp3"]:
temp_wav_file = f"temp_audio_{uuid.uuid4().hex}.wav"
voice_file, sample_rate = self.convert_audio_to_wav(voice_file, temp_wav_file)
if not voice_file:
logger.error("音频转换失败")
return Reply(ReplyType.ERROR, "音频转换失败")
elif file_ext == ".wav":
audio = AudioSegment.from_wav(voice_file)
sample_rate = audio.frame_rate
duration_ms = len(audio)
logger.debug(f"WAV 音频: {voice_file}, 采样率: {sample_rate}Hz, 时长: {duration_ms/1000}s")
if duration_ms < 100:
logger.error("音频文件过短,无法处理")
return Reply(ReplyType.ERROR, "音频文件过短,无法处理")
else:
logger.error("不支持的音频格式,仅支持 AMR、MP3 和 WAV")
return Reply(ReplyType.ERROR, "不支持的音频格式,仅支持 AMR、MP3 和 WAV")

with open(voice_file, "rb") as audio_file:
audio_content = audio_file.read()

audio = speech.RecognitionAudio(content=audio_content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=sample_rate,
language_code="cmn-CN",
)

response = self.speech_client.recognize(config=config, audio=audio)

transcript = ""
for result in response.results:
transcript += result.alternatives[0].transcript + " "

transcript = transcript.strip()
if not transcript:
logger.error("语音识别失败:无法理解音频内容")
return Reply(ReplyType.ERROR, "抱歉,我听不懂")

logger.info(f"[Google] voiceToText text={transcript} voice file name={voice_file}")
reply = Reply(ReplyType.TEXT, transcript)

if file_ext in [".amr", ".mp3"] and os.path.exists(voice_file):
os.remove(voice_file)

return reply
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

⚠️ STT 临时文件清理逻辑存在误删/漏删风险,异常路径可能留下临时文件

输入为 amr/mp3 时会将 voice_file 覆盖为转换后的 wav,随后按 file_ext 判断删除。异常时临时 wav 可能不会被删除;变量复用降低可读性且易引入误删 bug;未使用 TmpDir() 导致临时文件散落。

建议: 用独立变量保存 temp_wav_file,并在 finally 中清理;将临时文件放入 TmpDir() 目录下统一管理。

language_code="cmn-CN",
)

response = self.speech_client.recognize(config=config, audio=audio)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

⚠️ STT 使用同步 recognize 可能在长音频时失败(时长限制/超时)

speech_client.recognize 为同步接口,适用于短音频;长音频通常需要 long_running_recognize(尤其是经 GCS)。当前未处理长音频路径。

建议: 按时长/大小选择 recognize vs long_running_recognize;长音频先上传到 GCS 再识别,并加入超时/错误处理。

Comment on lines +195 to +203
request = texttospeech.SynthesizeLongAudioRequest(
parent=parent,
input=synthesis_input,
audio_config=audio_config,
voice=voice,
output_gcs_uri=output_gcs_uri,
)
operation = self.tts_long_client.synthesize_long_audio(request=request)
result = operation.result(timeout=600) # 等待长音频合成完成(最大 10 分钟)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

🚨 Long Audio TTS 请求参数可能使用了错误的 Request 类型/字段名,且 result 未使用

使用 texttospeech.SynthesizeLongAudioRequest 并传 request=... 调用 synthesize_long_audio,不同版本 SDK 的消息类型/字段可能差异;当前未校验 operation 成功与否,result 也未使用。timeout 固定 600s 可能不足。

建议: 以项目依赖版本为准校验 long audio API 的正确用法,补充 operation 状态/异常检查(exception()/done),并将 timeout 配置化与提供明确超时错误信息。

Comment on lines +166 to +221
gcs_output_path = f"output-{unique_id}.wav" # Long Audio 使用 WAV

# 配置语音参数(中文普通话)
voice = texttospeech.VoiceSelectionParams(
language_code="cmn-CN",
name="cmn-CN-Wavenet-A",
)

if byte_length <= 5000:
# 使用标准 Text-to-Speech API(短文本,输出 MP3)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
synthesis_input = texttospeech.SynthesisInput(text=cleaned_text)
response = self.tts_client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
with open(mp3_file, "wb") as out:
out.write(response.audio_content)
logger.info(f"[Google] textToVoice (standard) text={cleaned_text[:50]}... voice file name={mp3_file}")
return Reply(ReplyType.VOICE, mp3_file)
else:
# 使用 Long Audio API(长文本,输出 LINEAR16/WAV)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.LINEAR16
)
parent = f"projects/{self.project_id}/locations/global"
synthesis_input = texttospeech.SynthesisInput(text=cleaned_text)
output_gcs_uri = f"gs://{self.bucket_name}/{gcs_output_path}"
request = texttospeech.SynthesizeLongAudioRequest(
parent=parent,
input=synthesis_input,
audio_config=audio_config,
voice=voice,
output_gcs_uri=output_gcs_uri,
)
operation = self.tts_long_client.synthesize_long_audio(request=request)
result = operation.result(timeout=600) # 等待长音频合成完成(最大 10 分钟)

# 从 GCS 下载 WAV 文件
temp_wav_file = f"{TmpDir().path()}temp_wav_{unique_id}.wav"
bucket = self.storage_client.bucket(self.bucket_name)
blob = bucket.blob(gcs_output_path)
blob.download_to_filename(temp_wav_file)
logger.debug(f"从 GCS 下载 WAV 文件: {temp_wav_file}")

# 转换为 MP3
audio = AudioSegment.from_wav(temp_wav_file)
audio.export(mp3_file, format="mp3")
logger.info(f"[Google] textToVoice (long audio) text={cleaned_text[:50]}... voice file name={mp3_file}")

# 清理临时文件
os.remove(temp_wav_file)
blob.delete()

return Reply(ReplyType.VOICE, mp3_file)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

⚠️ 长文本合成落地为 WAV 再转 MP3:未指定采样率/比特率且缺少失败时清理保障

AudioSegment.export(mp3) 未指定 bitrate 等参数导致质量/体积不可控;下载 wav、转码、删除本地文件、删除 GCS blob 任一步失败都可能留下本地临时文件或 GCS 对象(产生费用/泄漏)。

建议: 用 try/finally 确保本地临时文件与 GCS blob 清理;转码显式设置参数(如 bitrate="128k"),并记录/输出 blob URI 便于排障。

Comment thread voice/google/config.json
@@ -0,0 +1,3 @@
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

💡 配置文件不应硬编码在仓库中,且建议提供示例文件而不是生效文件

config.json 写死 bucket name,占位值易误用。更推荐提供 config.example.json 并将真实 config.json 加入 .gitignore,或统一使用现有配置系统/环境变量。

建议: 改为 config.example.json + 文档说明;运行时从环境变量读取 bucket name。

Comment on lines +230 to +284
"""
语言代码: yue-HK
名称: yue-HK-Standard-A, 性别: FEMALE, 采样率: 24000Hz
名称: yue-HK-Standard-B, 性别: MALE, 采样率: 24000Hz
名称: yue-HK-Standard-C, 性别: FEMALE, 采样率: 24000Hz
名称: yue-HK-Standard-D, 性别: MALE, 采样率: 24000Hz

语言代码: cmn-CN
名称: cmn-CN-Chirp3-HD-Achernar, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Achird, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Algenib, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Algieba, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Alnilam, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Aoede, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Autonoe, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Callirrhoe, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Charon, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Despina, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Enceladus, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Erinome, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Fenrir, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Gacrux, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Iapetus, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Kore, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Laomedeia, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Leda, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Orus, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Puck, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Pulcherrima, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Rasalgethi, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Sadachbia, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Sadaltager, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Schedar, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Sulafat, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Umbriel, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Vindemiatrix, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Zephyr, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Chirp3-HD-Zubenelgenubi, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Standard-A, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Standard-B, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Standard-C, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Standard-D, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Wavenet-A, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-CN-Wavenet-B, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Wavenet-C, 性别: MALE, 采样率: 24000Hz
名称: cmn-CN-Wavenet-D, 性别: FEMALE, 采样率: 24000Hz

语言代码: cmn-TW
名称: cmn-TW-Standard-A, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-TW-Standard-B, 性别: MALE, 采样率: 24000Hz
名称: cmn-TW-Standard-C, 性别: MALE, 采样率: 24000Hz
名称: cmn-TW-Wavenet-A, 性别: FEMALE, 采样率: 24000Hz
名称: cmn-TW-Wavenet-B, 性别: MALE, 采样率: 24000Hz
名称: cmn-TW-Wavenet-C, 性别: MALE, 采样率: 24000Hz
""" No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

💡 大段注释语音列表建议移到文档或常量文件

文件末尾大段三引号注释增加噪音、影响可读性,且提示 No newline at end of file。

建议: 将语音列表移至 README/文档或单独配置;文件末尾补充换行。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants