update:add a method to access google cloud voice api by thzjy · Pull Request #2599 · zhayujie/CowAgent

thzjy · 2025-05-29T08:19:30Z

TTS增加长文处理，短文5000字节长文1M字节。STT修改自适应输入的语音参数用于语音识别。

safe1ine · 2026-02-05T04:07:53Z

@MonkeyCode-AI 帮忙 review

MonkeyCode-AI · 2026-02-05T04:08:27Z

MonkeyCode-AI 正在分析任务...

MonkeyCode-AI

我是 MonkeyCode AI 编程助手，你可以在 GitHub 仓库的 PR 中 at @MonkeyCode-AI 来呼唤我。

任务执行细节请参考: https://monkeycode-ai.com

代码审查结果

新增 Google Cloud STT/TTS（含长文本）实现思路可行，但当前实现存在凭据管理、安全合规、临时文件清理以及长音频 API 参数/文件格式处理等关键风险，建议合入前修正。

🚨 Critical	⚠️ Warning	💡 Suggestion
3	6	2

MonkeyCode-AI · 2026-02-05T04:12:04Z

@@ -0,0 +1 @@
+把你的google密钥文件替换成本文件。


Caution
🚨 将凭据文件纳入仓库的风险（即便是占位说明也容易导致误提交真实密钥）

PR 新增了 google-credentials.json 文件（当前内容为提示语），会强烈暗示用户在该路径放置真实服务账号密钥文件，后续极易被误提交到仓库；同时代码在运行时强绑定该相对路径，进一步提高把真实密钥放进项目目录的概率，属于严重安全/合规风险。

建议: 删除仓库中的 credentials 占位文件；改为读取环境变量（GOOGLE_APPLICATION_CREDENTIALS）或由部署环境通过挂载/secret 注入；并在 .gitignore 中忽略 voice/google/google-credentials.json。

MonkeyCode-AI · 2026-02-05T04:12:04Z

+# 设置 Google Cloud 凭据和配置文件路径
+cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")
+config_path = os.path.join(os.path.dirname(__file__), "config.json")
+os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path
+


Caution
🚨 在模块导入时强制设置 GOOGLE_APPLICATION_CREDENTIALS，破坏运行环境并可能引入安全问题

import 时执行 os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path 属于全局副作用：会覆盖宿主环境已配置的凭据、影响同进程其他 Google SDK 客户端、强耦合仓库结构且增加误用本地文件凭据风险。

建议: 不要在导入时写环境变量。优先由部署注入 GOOGLE_APPLICATION_CREDENTIALS，或在初始化时显式读取配置并传入 credentials。至少做到：仅在环境变量未设置且本地文件存在时回退，并在文件不存在时给出明确错误。

Suggested change

# 设置 Google Cloud 凭据和配置文件路径

cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")

config_path = os.path.join(os.path.dirname(__file__), "config.json")

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path

import os

import time

import uuid

import json

from google.cloud import speech

from google.cloud import texttospeech_v1 as texttospeech

from google.cloud import storage

from google.api_core.exceptions import GoogleAPIError

from pydub import AudioSegment

from bridge.reply import Reply, ReplyType

from common.log import logger

from common.tmp_dir import TmpDir

from voice.voice import Voice

from common.utils import remove_markdown_symbol

# 设置 Google Cloud 凭据和配置文件路径

cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")

config_path = os.path.join(os.path.dirname(__file__), "config.json")

# 不要覆盖外部已配置的凭据；仅在未设置且本地文件存在时回退

if "GOOGLE_APPLICATION_CREDENTIALS" not in os.environ and os.path.exists(cred_path):

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path

MonkeyCode-AI · 2026-02-05T04:12:04Z

+        # 从 google-credentials.json 获取 project_id
+        try:
+            with open(cred_path, 'r') as f:
+                credentials = json.load(f)
+                self.project_id = credentials.get('project_id')
+                if not self.project_id:
+                    raise ValueError("project_id 未在 google-credentials.json 中找到")
+                logger.debug(f"从 JSON 获取 project_id: {self.project_id}")
+        except Exception as e:
+            logger.error(f"无法读取 project_id: {e}")
+            raise
+        # 从 config.json 获取 bucket_name
+        try:
+            with open(config_path, 'r') as f:
+                config = json.load(f)
+                self.bucket_name = config.get('gcs_bucket_name')
+                if not self.bucket_name:
+                    raise ValueError("gcs_bucket_name 未在 config.json 中找到")
+                logger.debug(f"从 config.json 获取 bucket_name: {self.bucket_name}")
+        except Exception as e:
+            logger.error(f"无法读取 config.json: {e}")
+            raise


Warning
⚠️ 从 service account JSON 手动读取 project_id 的方式不稳健，且与运行时凭据来源可能不一致

代码从本地 cred_path 读取 project_id，但实际生效凭据可能来自环境变量/ADC，此时 project_id 可能不一致甚至文件不存在。通常可由客户端/默认配置解析 project，不建议手工读 JSON。

建议: 如需 project_id，优先从显式配置/环境变量（如 GOOGLE_CLOUD_PROJECT）提供，或使用客户端的默认 project 推断机制；避免强依赖 cred_path 文件存在。

MonkeyCode-AI · 2026-02-05T04:12:04Z

+    def convert_audio_to_wav(self, input_file_path, output_file_path="temp_audio.wav"):
+        """
+        将 AMR 或 MP3 文件转换为 WAV 格式
+        参数:
+            input_file_path: 输入音频文件路径（AMR 或 MP3）
+            output_file_path: 输出 WAV 文件路径
+        返回:
+            转换后的 WAV 文件路径及其采样率
+        """
+        try:
+            audio = AudioSegment.from_file(input_file_path)
+            sample_rate = audio.frame_rate
+            duration_ms = len(audio)
+            logger.debug(f"输入音频: {input_file_path}, 采样率: {sample_rate}Hz, 时长: {duration_ms/1000}s")
+            if duration_ms < 100:
+                logger.error("音频文件过短，无法处理")
+                return None, None
+            audio = audio.set_channels(1).set_sample_width(2)
+            audio.export(output_file_path, format="wav", codec="pcm_s16le")
+            return output_file_path, sample_rate


Warning
⚠️ convert_audio_to_wav 默认输出文件名固定，存在并发覆盖风险

convert_audio_to_wav 默认 output_file_path="temp_audio.wav"，并发调用会互相覆盖。尽管当前调用处传入 uuid 文件名，但该工具函数仍可能被误用。

建议: 移除固定默认文件名或默认使用 uuid 临时文件；建议把临时文件放到 TmpDir() 管理目录。

MonkeyCode-AI · 2026-02-05T04:12:04Z

+            file_ext = os.path.splitext(voice_file)[1].lower()
+            if file_ext in [".amr", ".mp3"]:
+                temp_wav_file = f"temp_audio_{uuid.uuid4().hex}.wav"
+                voice_file, sample_rate = self.convert_audio_to_wav(voice_file, temp_wav_file)
+                if not voice_file:
+                    logger.error("音频转换失败")
+                    return Reply(ReplyType.ERROR, "音频转换失败")
+            elif file_ext == ".wav":
+                audio = AudioSegment.from_wav(voice_file)
+                sample_rate = audio.frame_rate
+                duration_ms = len(audio)
+                logger.debug(f"WAV 音频: {voice_file}, 采样率: {sample_rate}Hz, 时长: {duration_ms/1000}s")
+                if duration_ms < 100:
+                    logger.error("音频文件过短，无法处理")
+                    return Reply(ReplyType.ERROR, "音频文件过短，无法处理")
+            else:
+                logger.error("不支持的音频格式，仅支持 AMR、MP3 和 WAV")
+                return Reply(ReplyType.ERROR, "不支持的音频格式，仅支持 AMR、MP3 和 WAV")
+
+            with open(voice_file, "rb") as audio_file:
+                audio_content = audio_file.read()
+
+            audio = speech.RecognitionAudio(content=audio_content)
+            config = speech.RecognitionConfig(
+                encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
+                sample_rate_hertz=sample_rate,
+                language_code="cmn-CN",
+            )
+
+            response = self.speech_client.recognize(config=config, audio=audio)
+
+            transcript = ""
+            for result in response.results:
+                transcript += result.alternatives[0].transcript + " "
+
+            transcript = transcript.strip()
+            if not transcript:
+                logger.error("语音识别失败：无法理解音频内容")
+                return Reply(ReplyType.ERROR, "抱歉，我听不懂")
+
+            logger.info(f"[Google] voiceToText text={transcript} voice file name={voice_file}")
+            reply = Reply(ReplyType.TEXT, transcript)
+
+            if file_ext in [".amr", ".mp3"] and os.path.exists(voice_file):
+                os.remove(voice_file)
+
+            return reply


Warning
⚠️ STT 临时文件清理逻辑存在误删/漏删风险，异常路径可能留下临时文件

输入为 amr/mp3 时会将 voice_file 覆盖为转换后的 wav，随后按 file_ext 判断删除。异常时临时 wav 可能不会被删除；变量复用降低可读性且易引入误删 bug；未使用 TmpDir() 导致临时文件散落。

建议: 用独立变量保存 temp_wav_file，并在 finally 中清理；将临时文件放入 TmpDir() 目录下统一管理。

MonkeyCode-AI · 2026-02-05T04:12:04Z

+                language_code="cmn-CN",
+            )
+
+            response = self.speech_client.recognize(config=config, audio=audio)


Warning
⚠️ STT 使用同步 recognize 可能在长音频时失败（时长限制/超时）

speech_client.recognize 为同步接口，适用于短音频；长音频通常需要 long_running_recognize（尤其是经 GCS）。当前未处理长音频路径。

建议: 按时长/大小选择 recognize vs long_running_recognize；长音频先上传到 GCS 再识别，并加入超时/错误处理。

MonkeyCode-AI · 2026-02-05T04:12:05Z

+                request = texttospeech.SynthesizeLongAudioRequest(
+                    parent=parent,
+                    input=synthesis_input,
+                    audio_config=audio_config,
+                    voice=voice,
+                    output_gcs_uri=output_gcs_uri,
+                )
+                operation = self.tts_long_client.synthesize_long_audio(request=request)
+                result = operation.result(timeout=600)  # 等待长音频合成完成（最大 10 分钟）


Caution
🚨 Long Audio TTS 请求参数可能使用了错误的 Request 类型/字段名，且 result 未使用

使用 texttospeech.SynthesizeLongAudioRequest 并传 request=... 调用 synthesize_long_audio，不同版本 SDK 的消息类型/字段可能差异；当前未校验 operation 成功与否，result 也未使用。timeout 固定 600s 可能不足。

建议: 以项目依赖版本为准校验 long audio API 的正确用法，补充 operation 状态/异常检查（exception()/done），并将 timeout 配置化与提供明确超时错误信息。

MonkeyCode-AI · 2026-02-05T04:12:05Z

+            gcs_output_path = f"output-{unique_id}.wav"  # Long Audio 使用 WAV
+
+            # 配置语音参数（中文普通话）
+            voice = texttospeech.VoiceSelectionParams(
+                language_code="cmn-CN",
+                name="cmn-CN-Wavenet-A",
+            )
+
+            if byte_length <= 5000:
+                # 使用标准 Text-to-Speech API（短文本，输出 MP3）
+                audio_config = texttospeech.AudioConfig(
+                    audio_encoding=texttospeech.AudioEncoding.MP3
+                )
+                synthesis_input = texttospeech.SynthesisInput(text=cleaned_text)
+                response = self.tts_client.synthesize_speech(
+                    input=synthesis_input, voice=voice, audio_config=audio_config
+                )
+                with open(mp3_file, "wb") as out:
+                    out.write(response.audio_content)
+                    logger.info(f"[Google] textToVoice (standard) text={cleaned_text[:50]}... voice file name={mp3_file}")
+                return Reply(ReplyType.VOICE, mp3_file)
+            else:
+                # 使用 Long Audio API（长文本，输出 LINEAR16/WAV）
+                audio_config = texttospeech.AudioConfig(
+                    audio_encoding=texttospeech.AudioEncoding.LINEAR16
+                )
+                parent = f"projects/{self.project_id}/locations/global"
+                synthesis_input = texttospeech.SynthesisInput(text=cleaned_text)
+                output_gcs_uri = f"gs://{self.bucket_name}/{gcs_output_path}"
+                request = texttospeech.SynthesizeLongAudioRequest(
+                    parent=parent,
+                    input=synthesis_input,
+                    audio_config=audio_config,
+                    voice=voice,
+                    output_gcs_uri=output_gcs_uri,
+                )
+                operation = self.tts_long_client.synthesize_long_audio(request=request)
+                result = operation.result(timeout=600)  # 等待长音频合成完成（最大 10 分钟）
+
+                # 从 GCS 下载 WAV 文件
+                temp_wav_file = f"{TmpDir().path()}temp_wav_{unique_id}.wav"
+                bucket = self.storage_client.bucket(self.bucket_name)
+                blob = bucket.blob(gcs_output_path)
+                blob.download_to_filename(temp_wav_file)
+                logger.debug(f"从 GCS 下载 WAV 文件: {temp_wav_file}")
+
+                # 转换为 MP3
+                audio = AudioSegment.from_wav(temp_wav_file)
+                audio.export(mp3_file, format="mp3")
+                logger.info(f"[Google] textToVoice (long audio) text={cleaned_text[:50]}... voice file name={mp3_file}")
+
+                # 清理临时文件
+                os.remove(temp_wav_file)
+                blob.delete()
+
+                return Reply(ReplyType.VOICE, mp3_file)


Warning
⚠️ 长文本合成落地为 WAV 再转 MP3：未指定采样率/比特率且缺少失败时清理保障

AudioSegment.export(mp3) 未指定 bitrate 等参数导致质量/体积不可控；下载 wav、转码、删除本地文件、删除 GCS blob 任一步失败都可能留下本地临时文件或 GCS 对象（产生费用/泄漏）。

建议: 用 try/finally 确保本地临时文件与 GCS blob 清理；转码显式设置参数（如 bitrate="128k"），并记录/输出 blob URI 便于排障。

MonkeyCode-AI · 2026-02-05T04:12:05Z

@@ -0,0 +1,3 @@
+{


Tip
💡 配置文件不应硬编码在仓库中，且建议提供示例文件而不是生效文件

config.json 写死 bucket name，占位值易误用。更推荐提供 config.example.json 并将真实 config.json 加入 .gitignore，或统一使用现有配置系统/环境变量。

建议: 改为 config.example.json + 文档说明；运行时从环境变量读取 bucket name。

MonkeyCode-AI · 2026-02-05T04:12:05Z

+"""
+语言代码: yue-HK
+  名称: yue-HK-Standard-A, 性别: FEMALE, 采样率: 24000Hz
+  名称: yue-HK-Standard-B, 性别: MALE, 采样率: 24000Hz
+  名称: yue-HK-Standard-C, 性别: FEMALE, 采样率: 24000Hz
+  名称: yue-HK-Standard-D, 性别: MALE, 采样率: 24000Hz
+
+  语言代码: cmn-CN
+  名称: cmn-CN-Chirp3-HD-Achernar, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Achird, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Algenib, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Algieba, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Alnilam, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Aoede, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Autonoe, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Callirrhoe, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Charon, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Despina, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Enceladus, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Erinome, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Fenrir, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Gacrux, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Iapetus, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Kore, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Laomedeia, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Leda, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Orus, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Puck, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Pulcherrima, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Rasalgethi, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Sadachbia, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Sadaltager, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Schedar, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Sulafat, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Umbriel, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Vindemiatrix, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Zephyr, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Chirp3-HD-Zubenelgenubi, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Standard-A, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Standard-B, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Standard-C, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Standard-D, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Wavenet-A, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-CN-Wavenet-B, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Wavenet-C, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-CN-Wavenet-D, 性别: FEMALE, 采样率: 24000Hz
+
+语言代码: cmn-TW
+  名称: cmn-TW-Standard-A, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-TW-Standard-B, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-TW-Standard-C, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-TW-Wavenet-A, 性别: FEMALE, 采样率: 24000Hz
+  名称: cmn-TW-Wavenet-B, 性别: MALE, 采样率: 24000Hz
+  名称: cmn-TW-Wavenet-C, 性别: MALE, 采样率: 24000Hz
+"""


Tip
💡 大段注释语音列表建议移到文档或常量文件

文件末尾大段三引号注释增加噪音、影响可读性，且提示 No newline at end of file。

建议: 将语音列表移至 README/文档或单独配置；文件末尾补充换行。

update:add a method to access google cloud voice api

0059a0c

MonkeyCode-AI reviewed Feb 5, 2026

View reviewed changes

		@@ -0,0 +1 @@
		把你的google密钥文件替换成本文件。

-# 设置 Google Cloud 凭据和配置文件路径
-cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")
-config_path = os.path.join(os.path.dirname(__file__), "config.json")
-os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path
+import os
+import time
+import uuid
+import json
+from google.cloud import speech
+from google.cloud import texttospeech_v1 as texttospeech
+from google.cloud import storage
+from google.api_core.exceptions import GoogleAPIError
+from pydub import AudioSegment
+from bridge.reply import Reply, ReplyType
+from common.log import logger
+from common.tmp_dir import TmpDir
+from voice.voice import Voice
+from common.utils import remove_markdown_symbol
+# 设置 Google Cloud 凭据和配置文件路径
+cred_path = os.path.join(os.path.dirname(__file__), "google-credentials.json")
+config_path = os.path.join(os.path.dirname(__file__), "config.json")
+# 不要覆盖外部已配置的凭据；仅在未设置且本地文件存在时回退
+if "GOOGLE_APPLICATION_CREDENTIALS" not in os.environ and os.path.exists(cred_path):
+    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path

Conversation

thzjy commented May 29, 2025

Uh oh!

safe1ine commented Feb 5, 2026

Uh oh!

MonkeyCode-AI commented Feb 5, 2026

Uh oh!

MonkeyCode-AI left a comment

Choose a reason for hiding this comment

代码审查结果

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants