feat: 增加对口型功能/增加保存和回放弹幕#23
Conversation
## Reviewer's Guide (审阅者指南)
This PR refactors and extends plugins to support lip-sync across VTube Studio and TTS pipelines, and enriches the BiliDanmakuSeleniumPlugin with file-based danmaku saving and replay, improved configuration handling, and robust WebDriver management.
此 PR 重构并扩展了插件,以支持 VTube Studio 和 TTS 管道之间的唇形同步,并增强了 BiliDanmakuSeleniumPlugin,使其具有基于文件的弹幕保存和重放、改进的配置处理以及强大的 WebDriver 管理。
#### Sequence Diagram: Lip Sync Audio Processing for VTube Studio (时序图:VTube Studio 的唇形同步音频处理)
```mermaid
sequenceDiagram
participant User as End User
participant TTSPlugin as TTS Plugin (e.g., EdgeTTS, GPT-SoVITS)
participant AmaidesuCore as Amaidesu Core
participant VTubeStudioPlugin as VTubeStudio Plugin
participant VTSApp as VTube Studio Application
User->>AmaidesuCore: Initiate Action (e.g., send message)
AmaidesuCore->>TTSPlugin: speak(text_to_synthesize)
TTSPlugin->>AmaidesuCore: Get Service ("vts_lip_sync")
AmaidesuCore-->>TTSPlugin: vts_lip_sync_service (VTubeStudioPlugin)
TTSPlugin->>VTubeStudioPlugin: start_lip_sync_session(text_to_synthesize)
loop Audio Stream Processing
TTSPlugin->>TTSPlugin: Generate audio_chunk
TTSPlugin->>VTubeStudioPlugin: process_tts_audio(audio_chunk, sample_rate)
VTubeStudioPlugin->>VTubeStudioPlugin: analyze_audio_chunk(audio_chunk)
VTubeStudioPlugin->>VTSApp: Set Lip Sync Parameters (VoiceVolume, MouthOpen, VoiceA, etc.)
end
TTSPlugin->>VTubeStudioPlugin: stop_lip_sync_session()Sequence Diagram: Danmaku Replay from File in BiliDanmakuSeleniumPlugin (时序图:BiliDanmakuSeleniumPlugin 中从文件重放弹幕)sequenceDiagram
participant BiliDanmakuSeleniumPlugin as Plugin
participant Filesystem
participant AmaidesuCore
Note over Plugin: In file_only_mode
Plugin->>Plugin: setup() called
Plugin->>Plugin: _load_danmaku_from_file()
Plugin->>Filesystem: Read danmaku_load_file (e.g., danmaku_ROOMID.jsonl)
Filesystem-->>Plugin: JSONL data lines
loop For each line in JSONL data
Plugin->>Plugin: Parse JSON to MessageBase object
Plugin->>Plugin: Add MessageBase to loaded_danmaku_queue
end
Plugin->>Plugin: _run_file_replay_loop() starts
loop For each message_base in loaded_danmaku_queue
Plugin->>Plugin: Calculate wait_time (based on message_base.message_info.time)
Plugin->>Plugin: await asyncio.wait_for(stop_event, timeout=wait_time)
Plugin->>Plugin: message_cache_service.cache_message(message_base)
Plugin->>AmaidesuCore: send_to_maicore(message_base)
end
Sequence Diagram: Live Danmaku Saving in BiliDanmakuSeleniumPlugin (时序图:BiliDanmakuSeleniumPlugin 中保存实时弹幕)sequenceDiagram
participant BiliDanmakuSeleniumPlugin as Plugin
participant WebDriver
participant Filesystem
participant AmaidesuCore
Note over Plugin: Live danmaku monitoring
Plugin->>WebDriver: Fetch raw danmaku elements from Bilibili page
WebDriver-->>Plugin: Raw danmaku elements
Plugin->>Plugin: Parse elements into DanmakuMessage objects
loop For each DanmakuMessage
Plugin->>Plugin: _create_message_base(danmaku_message)
Plugin->>Plugin: message_base created
Plugin->>Plugin: message_cache_service.cache_message(message_base)
opt enable_danmaku_save is true
Plugin->>Plugin: _save_danmaku_to_file(message_base)
Plugin->>Filesystem: Write MessageBase as JSON to danmaku_save_file
end
Plugin->>AmaidesuCore: send_to_maicore(message_base)
end
Class Diagram: VTubeStudioPlugin Lip Sync Enhancements (类图:VTubeStudioPlugin 唇形同步增强)classDiagram
class VTubeStudioPlugin {
+lip_sync_enabled: bool
+volume_threshold: float
+smoothing_factor: float
+vowel_detection_sensitivity: float
+sample_rate: int
+min_accumulation_duration: float
+playback_sync_enabled: bool
-audio_buffer: deque
-current_vowel_values: Dict[str, float]
-current_volume: float
-is_speaking: bool
-audio_analysis_lock: threading.Lock
-accumulated_audio: bytearray
-accumulation_start_time: float
-audio_playback_start_time: float
-vowel_formants: Dict[str, List[int]]
+setup()
+analyze_audio_chunk(audio_data: bytes, sample_rate: int) Dict[str, float]
-_analyze_vowel_features(magnitude: np.ndarray, freqs: np.ndarray) Dict[str, float]
+process_tts_audio(audio_data: bytes, sample_rate: int)
-_update_lip_sync_parameters(analysis_result: Dict[str, float])
+start_lip_sync_session(text: str)
+stop_lip_sync_session()
+reset_playback_timing()
}
VTubeStudioPlugin --|> BasePlugin
VTubeStudioPlugin ..> AmaidesuCore : uses
Class Diagram: BiliDanmakuSeleniumPlugin File and Configuration Enhancements (类图:BiliDanmakuSeleniumPlugin 文件和配置增强)classDiagram
class BiliDanmakuSeleniumPlugin {
+config: Dict[str, Any]
+enabled: bool
+enable_danmaku_save: bool
+danmaku_save_file: str
+save_file_path: Path
+enable_danmaku_load: bool
+danmaku_load_file: str
+load_file_path: Path
+skip_initial_danmaku: bool
+data_dir: Path
-is_initial_load: bool
-initial_load_complete: bool
-loaded_danmaku_queue: List[MessageBase]
-loaded_danmaku_index: int
-file_only_mode: bool
-shutdown_timeout: int
-cleanup_lock: threading.Lock
-is_shutting_down: bool
+__init__(core: AmaidesuCore, config: Dict[str, Any])
-_setup_signal_handlers()
-_graceful_shutdown()
+setup()
+cleanup()
-_create_webdriver()
-_run_monitoring_loop()
-_run_file_replay_loop()
-_run_live_monitoring_loop()
-_fetch_and_process_messages()
-_load_danmaku_from_file()
-_save_danmaku_to_file(message_base: MessageBase)
-_send_loaded_danmaku()
}
BiliDanmakuSeleniumPlugin --|> BasePlugin
BiliDanmakuSeleniumPlugin ..> AmaidesuCore : uses
BiliDanmakuSeleniumPlugin o-- MessageCacheService : uses
Class Diagram: TTS Plugins Integration with VTube Studio Lip Sync Service (类图:TTS 插件与 VTube Studio 唇形同步服务集成)classDiagram
class IVTubeStudioLipSyncService {
<<Interface>>
+start_lip_sync_session(text: str)
+process_tts_audio(audio_data: bytes, sample_rate: int)
+stop_lip_sync_session()
}
class TTSPlugin {
-_speak(text: str)
-_play_with_lip_sync(audio_data: np.ndarray, samplerate: int, vts_lip_sync_service: IVTubeStudioLipSyncService)
}
class GPTSoVitsTTSPlugin {
-decode_and_buffer(wav_chunk)
-_speak(text: str)
}
TTSPlugin ..> IVTubeStudioLipSyncService : uses
GPTSoVitsTTSPlugin ..> IVTubeStudioLipSyncService : uses
TTSPlugin --|> BasePlugin
GPTSoVitsTTSPlugin --|> BasePlugin
File-Level Changes (文件级别更改)
Tips and commands (提示和命令)Interacting with Sourcery (与 Sourcery 交互)
Customizing Your Experience (自定义您的体验)Access your dashboard to: (访问您的 仪表板 以:)
Getting Help (获取帮助)
Original review guide in EnglishReviewer's GuideThis PR refactors and extends plugins to support lip-sync across VTube Studio and TTS pipelines, and enriches the BiliDanmakuSeleniumPlugin with file-based danmaku saving and replay, improved configuration handling, and robust WebDriver management. Sequence Diagram: Lip Sync Audio Processing for VTube StudiosequenceDiagram
participant User as End User
participant TTSPlugin as TTS Plugin (e.g., EdgeTTS, GPT-SoVITS)
participant AmaidesuCore as Amaidesu Core
participant VTubeStudioPlugin as VTubeStudio Plugin
participant VTSApp as VTube Studio Application
User->>AmaidesuCore: Initiate Action (e.g., send message)
AmaidesuCore->>TTSPlugin: speak(text_to_synthesize)
TTSPlugin->>AmaidesuCore: Get Service ("vts_lip_sync")
AmaidesuCore-->>TTSPlugin: vts_lip_sync_service (VTubeStudioPlugin)
TTSPlugin->>VTubeStudioPlugin: start_lip_sync_session(text_to_synthesize)
loop Audio Stream Processing
TTSPlugin->>TTSPlugin: Generate audio_chunk
TTSPlugin->>VTubeStudioPlugin: process_tts_audio(audio_chunk, sample_rate)
VTubeStudioPlugin->>VTubeStudioPlugin: analyze_audio_chunk(audio_chunk)
VTubeStudioPlugin->>VTSApp: Set Lip Sync Parameters (VoiceVolume, MouthOpen, VoiceA, etc.)
end
TTSPlugin->>VTubeStudioPlugin: stop_lip_sync_session()
Sequence Diagram: Danmaku Replay from File in BiliDanmakuSeleniumPluginsequenceDiagram
participant BiliDanmakuSeleniumPlugin as Plugin
participant Filesystem
participant AmaidesuCore
Note over Plugin: In file_only_mode
Plugin->>Plugin: setup() called
Plugin->>Plugin: _load_danmaku_from_file()
Plugin->>Filesystem: Read danmaku_load_file (e.g., danmaku_ROOMID.jsonl)
Filesystem-->>Plugin: JSONL data lines
loop For each line in JSONL data
Plugin->>Plugin: Parse JSON to MessageBase object
Plugin->>Plugin: Add MessageBase to loaded_danmaku_queue
end
Plugin->>Plugin: _run_file_replay_loop() starts
loop For each message_base in loaded_danmaku_queue
Plugin->>Plugin: Calculate wait_time (based on message_base.message_info.time)
Plugin->>Plugin: await asyncio.wait_for(stop_event, timeout=wait_time)
Plugin->>Plugin: message_cache_service.cache_message(message_base)
Plugin->>AmaidesuCore: send_to_maicore(message_base)
end
Sequence Diagram: Live Danmaku Saving in BiliDanmakuSeleniumPluginsequenceDiagram
participant BiliDanmakuSeleniumPlugin as Plugin
participant WebDriver
participant Filesystem
participant AmaidesuCore
Note over Plugin: Live danmaku monitoring
Plugin->>WebDriver: Fetch raw danmaku elements from Bilibili page
WebDriver-->>Plugin: Raw danmaku elements
Plugin->>Plugin: Parse elements into DanmakuMessage objects
loop For each DanmakuMessage
Plugin->>Plugin: _create_message_base(danmaku_message)
Plugin->>Plugin: message_base created
Plugin->>Plugin: message_cache_service.cache_message(message_base)
opt enable_danmaku_save is true
Plugin->>Plugin: _save_danmaku_to_file(message_base)
Plugin->>Filesystem: Write MessageBase as JSON to danmaku_save_file
end
Plugin->>AmaidesuCore: send_to_maicore(message_base)
end
Class Diagram: VTubeStudioPlugin Lip Sync EnhancementsclassDiagram
class VTubeStudioPlugin {
+lip_sync_enabled: bool
+volume_threshold: float
+smoothing_factor: float
+vowel_detection_sensitivity: float
+sample_rate: int
+min_accumulation_duration: float
+playback_sync_enabled: bool
-audio_buffer: deque
-current_vowel_values: Dict[str, float]
-current_volume: float
-is_speaking: bool
-audio_analysis_lock: threading.Lock
-accumulated_audio: bytearray
-accumulation_start_time: float
-audio_playback_start_time: float
-vowel_formants: Dict[str, List[int]]
+setup()
+analyze_audio_chunk(audio_data: bytes, sample_rate: int) Dict[str, float]
-_analyze_vowel_features(magnitude: np.ndarray, freqs: np.ndarray) Dict[str, float]
+process_tts_audio(audio_data: bytes, sample_rate: int)
-_update_lip_sync_parameters(analysis_result: Dict[str, float])
+start_lip_sync_session(text: str)
+stop_lip_sync_session()
+reset_playback_timing()
}
VTubeStudioPlugin --|> BasePlugin
VTubeStudioPlugin ..> AmaidesuCore : uses
Class Diagram: BiliDanmakuSeleniumPlugin File and Configuration EnhancementsclassDiagram
class BiliDanmakuSeleniumPlugin {
+config: Dict[str, Any]
+enabled: bool
+enable_danmaku_save: bool
+danmaku_save_file: str
+save_file_path: Path
+enable_danmaku_load: bool
+danmaku_load_file: str
+load_file_path: Path
+skip_initial_danmaku: bool
+data_dir: Path
-is_initial_load: bool
-initial_load_complete: bool
-loaded_danmaku_queue: List[MessageBase]
-loaded_danmaku_index: int
-file_only_mode: bool
-shutdown_timeout: int
-cleanup_lock: threading.Lock
-is_shutting_down: bool
+__init__(core: AmaidesuCore, config: Dict[str, Any])
-_setup_signal_handlers()
-_graceful_shutdown()
+setup()
+cleanup()
-_create_webdriver()
-_run_monitoring_loop()
-_run_file_replay_loop()
-_run_live_monitoring_loop()
-_fetch_and_process_messages()
-_load_danmaku_from_file()
-_save_danmaku_to_file(message_base: MessageBase)
-_send_loaded_danmaku()
}
BiliDanmakuSeleniumPlugin --|> BasePlugin
BiliDanmakuSeleniumPlugin ..> AmaidesuCore : uses
BiliDanmakuSeleniumPlugin o-- MessageCacheService : uses
Class Diagram: TTS Plugins Integration with VTube Studio Lip Sync ServiceclassDiagram
class IVTubeStudioLipSyncService {
<<Interface>>
+start_lip_sync_session(text: str)
+process_tts_audio(audio_data: bytes, sample_rate: int)
+stop_lip_sync_session()
}
class TTSPlugin {
-_speak(text: str)
-_play_with_lip_sync(audio_data: np.ndarray, samplerate: int, vts_lip_sync_service: IVTubeStudioLipSyncService)
}
class GPTSoVitsTTSPlugin {
-decode_and_buffer(wav_chunk)
-_speak(text: str)
}
TTSPlugin ..> IVTubeStudioLipSyncService : uses
GPTSoVitsTTSPlugin ..> IVTubeStudioLipSyncService : uses
TTSPlugin --|> BasePlugin
GPTSoVitsTTSPlugin --|> BasePlugin
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Summary of Changes
Hello @tcmofashi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request enhances the Bilibili Selenium plugin with danmaku saving and replaying capabilities, including an offline replay mode. Additionally, it significantly upgrades the VTube Studio plugin by adding real-time lip synchronization driven by the audio output from TTS plugins, providing a more dynamic and responsive avatar experience.
Highlights
- VTube Studio Lip Sync: Adds real-time lip synchronization functionality to the VTube Studio plugin. It integrates with TTS plugins (like Edge TTS and GPT-SoVITS) to analyze the audio output during speech and control VTS parameters such as
VoiceVolume,VoiceSilence, and individual vowel parameters (VoiceA,I,U,E,O) for more realistic avatar mouth movements. Includes configuration options for sensitivity, smoothing, and playback time synchronization. - Bilibili Danmaku Save/Replay: Introduces the ability to save incoming danmaku messages from the Selenium plugin to a JSONL file. It also adds a feature to load and replay danmaku from a saved file, including a 'pure file mode' that operates offline and attempts to synchronize the replay timing based on the original message timestamps.
- Bilibili Danmaku Initial Skip: Adds a configuration option (
skip_initial_danmaku) to the Bilibili Selenium plugin to ignore messages that are already present in the chat when the plugin starts, focusing only on newly arriving messages. - Configuration Updates: Adds new configuration options in
config-template.tomlfor both the Bilibili Selenium plugin (danmaku save/load paths, skip initial, chromedriver path) and the VTube Studio plugin (detailed lip sync parameters).
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces significant enhancements, notably file-based danmaku saving/replaying for the Bilibili plugin and real-time lip-sync for VTube Studio, integrated with TTS plugins. The documentation and configuration templates have been thoroughly updated.
Key points for review include:
- A potential issue in
bili_danmaku_selenium/plugin.pywhere live danmaku might not be processed byMaiCoreifenable_danmaku_loadis false. - In
tts/plugin.py, the method for scheduling asyncio tasks from thesounddevicecallback thread should be revised to useasyncio.run_coroutine_threadsafe. - A minor typo in a comment in
vtube_studio/plugin.py.
| await self.core.send_to_maicore(message_base) | ||
|
|
||
| # 发送消息 | ||
| # await self.core.send_to_maicore(message_base) |
There was a problem hiding this comment.
The line await self.core.send_to_maicore(message_base) is currently commented out. This means that live messages fetched by _fetch_and_process_messages are cached and potentially saved (if enable_danmaku_save is true), but they are not sent to MaiCore for processing unless enable_danmaku_load is also true and they are subsequently loaded from the file.
If the intention is for live danmaku to be processed in real-time regardless of the load/save settings (which is typical for a live danmaku plugin), this line should likely be uncommented. Otherwise, the documentation should clearly state that live messages are only processed if they are saved and then loaded, or if enable_danmaku_load is active for some other reason.
Could you clarify the intended behavior here? If live messages should always be processed, please uncomment this line.
| asyncio.create_task( | ||
| vts_lip_sync_service.process_tts_audio(chunk_bytes, sample_rate=samplerate) | ||
| ) |
There was a problem hiding this comment.
The audio_callback is executed by sounddevice in a separate, non-asyncio thread. Calling asyncio.create_task directly from such a thread might not schedule the coroutine on the main event loop where AmaidesuCore and other asyncio components are running. This can lead to unexpected behavior or errors if the coroutine tries to interact with objects tied to the main loop.
To safely schedule vts_lip_sync_service.process_tts_audio on the main event loop from this callback thread, you should use asyncio.run_coroutine_threadsafe.
| asyncio.create_task( | |
| vts_lip_sync_service.process_tts_audio(chunk_bytes, sample_rate=samplerate) | |
| ) | |
| # Get the main event loop once before starting the stream if not already available | |
| # main_loop = asyncio.get_running_loop() # Assuming this method has access to the loop | |
| coro = vts_lip_sync_service.process_tts_audio(chunk_bytes, sample_rate=samplerate) | |
| asyncio.run_coroutine_threadsafe(coro, asyncio.get_running_loop()) # Or pass the specific loop |
| danmaku_data = json.loads(line.strip()) | ||
|
|
||
| # 将字典转换为MessageBase对象 | ||
| from maim_message import MessageBase |
There was a problem hiding this comment.
The import from maim_message import MessageBase is done locally within the _load_danmaku_from_file method. MessageBase is already imported at the module level (line 37). Rely on module-level imports for consistency and to avoid potential shadowing or repeated import overhead. Remove this local import and use the MessageBase already available in the module's scope.
好的,这是翻译成中文的 pull request 总结:
Sourcery 总结
为 Bilibili 弹幕插件添加基于文件的保存和回放功能,以及跳过初始弹幕和纯文件回放模式;在 VTube Studio 插件中实现实时唇形同步,通过音频分析并将唇形同步的启动/停止集成到 TTS 和 GPT-SoVITS 插件中。
新功能:
增强功能:
文档:
Original summary in English
Summary by Sourcery
Add file-based saving and replay for Bilibili danmaku plugin, plus skip-initial-danmaku and pure file replay modes; implement real-time lip-sync in VTube Studio plugin with audio analysis and integrate lip-sync start/stop into TTS and GPT-SoVITS plugins.
New Features:
Enhancements:
Documentation: