Skip to content

Conversation

@78
Copy link
Owner

@78 78 commented Sep 16, 2025

This pull request adds support for configuring custom wake words and multinet speech recognition models through a new JSON-based configuration, enhancing flexibility and maintainability. The main changes include parsing a new index.json asset for model and command configuration, updating the build script to extract and generate this configuration from sdkconfig, and refactoring the asset build pipeline to support both wakenet and multinet models. These improvements make it easier to customize speech recognition features without code changes.

Custom Wake Word and Multinet Model Configuration:

  • Added a ParseWakenetModelConfig method in CustomWakeWord to load language, duration, threshold, and commands from index.json at runtime, allowing dynamic configuration of wake word and commands. (main/audio/wake_words/custom_wake_word.cc, main/audio/wake_words/custom_wake_word.h) [1] [2]
  • Extended the internal Command structure and related logic to support multiple commands and actions, not just a single wake word. (main/audio/wake_words/custom_wake_word.h, main/audio/wake_words/custom_wake_word.cc) [1] [2] [3]

Asset Build Script Enhancements:

  • Refactored build_default_assets.py to process both wakenet and multinet models, extract their configuration from sdkconfig, and generate a comprehensive index.json containing model and command metadata. (scripts/build_default_assets.py) [1] [2] [3] [4] [5]
  • Added functions to parse multinet model names, determine language, and extract custom wake word settings from sdkconfig. (scripts/build_default_assets.py) [1] [2]
  • Modified the asset build process to include multinet_model_info in the generated index.json, enabling the firmware to access all relevant SR model and command configuration at runtime. (scripts/build_default_assets.py) [1] [2] [3]

These changes significantly improve the flexibility of speech recognition model configuration and asset management, making it easier to deploy and update custom wake words and command sets.

@78 78 merged commit d2e99ba into main Sep 16, 2025
82 of 94 checks passed
@78 78 mentioned this pull request Sep 16, 2025
3 tasks
@78 78 deleted the fix_multinet branch September 27, 2025 11:54
Wvirgil123 added a commit to Wvirgil123/xiaozhi-esp32 that referenced this pull request Sep 30, 2025
* main: (43 commits)
  OTTO 左右腿反了 (78#1239)
  Bump to 2.0.3 (78#1241)
  fix emote display errors (78#1240)
  fix:小智云聊some bugfix (78#1238)
  ci: support multiple variants per board (78#1036)
  添加太极派双声道配置 (78#1235)
  fix multiple wakenet words and custom wake word (78#1226)
  添加 Waveshare ESP32-S3-Touch-LCD-3.49 (78#1227)
  New Waveshare ESP32-S3-Touch-LCD-4B third party board, 86 box form. (78#1199)
  feat: add emote style for v2 (78#1217)
  ESP32 Wifi And 4G Merge In All (78#1219)
  fix: Add function to handle local asset file paths
  Detect wake word model from index.json (78#1211)
  fix: Corrected the inverted touch screen parameter configuration of lichuang_S3_dev, which caused touch offset. (78#1209)
  fix multinet model for v2 (78#1208)
  fix: ESP-HI audio sampling problem (78#1207)
  feat: build default assets instead of downloading and v2 tables for esp-hi, echoear (78#1203)
  regenerate jpeg encoder (78#1198)
  Bump to 2.0.1
  feat: add snapshot mcp tool (78#1196)
  ...
Cmdmac pushed a commit to Cmdmac/xiaozhi-esp32 that referenced this pull request Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants