fix multinet model for v2 #1208

78 · 2025-09-16T11:04:57Z

This pull request adds support for configuring custom wake words and multinet speech recognition models through a new JSON-based configuration, enhancing flexibility and maintainability. The main changes include parsing a new index.json asset for model and command configuration, updating the build script to extract and generate this configuration from sdkconfig, and refactoring the asset build pipeline to support both wakenet and multinet models. These improvements make it easier to customize speech recognition features without code changes.

Custom Wake Word and Multinet Model Configuration:

Added a ParseWakenetModelConfig method in CustomWakeWord to load language, duration, threshold, and commands from index.json at runtime, allowing dynamic configuration of wake word and commands. (main/audio/wake_words/custom_wake_word.cc, main/audio/wake_words/custom_wake_word.h) [1] [2]
Extended the internal Command structure and related logic to support multiple commands and actions, not just a single wake word. (main/audio/wake_words/custom_wake_word.h, main/audio/wake_words/custom_wake_word.cc) [1] [2] [3]

Asset Build Script Enhancements:

Refactored build_default_assets.py to process both wakenet and multinet models, extract their configuration from sdkconfig, and generate a comprehensive index.json containing model and command metadata. (scripts/build_default_assets.py) [1] [2] [3] [4] [5]
Added functions to parse multinet model names, determine language, and extract custom wake word settings from sdkconfig. (scripts/build_default_assets.py) [1] [2]
Modified the asset build process to include multinet_model_info in the generated index.json, enabling the firmware to access all relevant SR model and command configuration at runtime. (scripts/build_default_assets.py) [1] [2] [3]

These changes significantly improve the flexibility of speech recognition model configuration and asset management, making it easier to deploy and update custom wake words and command sets.

* main: (43 commits) OTTO 左右腿反了 (78#1239) Bump to 2.0.3 (78#1241) fix emote display errors (78#1240) fix:小智云聊some bugfix (78#1238) ci: support multiple variants per board (78#1036) 添加太极派双声道配置 (78#1235) fix multiple wakenet words and custom wake word (78#1226) 添加 Waveshare ESP32-S3-Touch-LCD-3.49 (78#1227) New Waveshare ESP32-S3-Touch-LCD-4B third party board, 86 box form. (78#1199) feat: add emote style for v2 (78#1217) ESP32 Wifi And 4G Merge In All (78#1219) fix: Add function to handle local asset file paths Detect wake word model from index.json (78#1211) fix: Corrected the inverted touch screen parameter configuration of lichuang_S3_dev, which caused touch offset. (78#1209) fix multinet model for v2 (78#1208) fix: ESP-HI audio sampling problem (78#1207) feat: build default assets instead of downloading and v2 tables for esp-hi, echoear (78#1203) regenerate jpeg encoder (78#1198) Bump to 2.0.1 feat: add snapshot mcp tool (78#1196) ...

fix multinet model for v2

350926a

78 merged commit d2e99ba into main Sep 16, 2025
82 of 94 checks passed

78 mentioned this pull request Sep 16, 2025

自定义关键词唤醒无法启用 #1191

Open

3 tasks

78 deleted the fix_multinet branch September 27, 2025 11:54

Cmdmac pushed a commit to Cmdmac/xiaozhi-esp32 that referenced this pull request Oct 14, 2025

fix multinet model for v2 (78#1208)

8687b01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix multinet model for v2 #1208

fix multinet model for v2 #1208

Uh oh!

78 commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix multinet model for v2 #1208

fix multinet model for v2 #1208

Uh oh!

Conversation

78 commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants