Skip to content

ESPHome 2025.5.0 S3 Box 3 reboots on end of voice assistant announcement #159

@chrisdunnname

Description

@chrisdunnname

Since updating to ESPHome 2025.0 the S3 box reboots on the end of any conversation when it stops the mixing speaker.

[01:13:50][D][voice_assistant:626]: Speech recognised as: "Hi."
[01:13:50][D][voice_assistant:456]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[01:13:50][D][voice_assistant:598]: Event Type: 5
[01:13:50][D][voice_assistant:631]: Intent started
[01:13:50][D][i2s_audio.microphone:467]: Task is stopping, attempting to unload the I2S audio driver
[01:13:50][D][i2s_audio.microphone:472]: Task is finished, freeing resources
[01:13:50][D][voice_assistant:598]: Event Type: 6
[01:13:50][D][voice_assistant:598]: Event Type: 7
[01:13:50][D][voice_assistant:656]: Response: "Hello from Home Assistant."
[01:13:50][D][voice_assistant:598]: Event Type: 8
[01:13:50][D][voice_assistant:678]: Response URL: "http://192.168.210.124:8123/api/tts_proxy/N8NZ5AQTlsBHZM689FoQhQ.flac"
[01:13:50][D][voice_assistant:456]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[01:13:50][D][voice_assistant:463]: Desired state set to STREAMING_RESPONSE
[01:13:50][D][media_player:074]: 'Media Player' - Setting
[01:13:50][D][media_player:081]:   Media URL: http://192.168.210.124:8123/api/tts_proxy/N8NZ5AQTlsBHZM689FoQhQ.flac
[01:13:50][D][media_player:087]:  Announcement: yes
[01:13:50][D][speaker_media_player:408]: State changed to ANNOUNCING
[01:13:50][D][voice_assistant:598]: Event Type: 2
[01:13:50][D][voice_assistant:697]: Assist Pipeline ended
[01:13:50][D][sensor:093]: 'S3 Temperature': Sending state 21.09534 °C with 2 decimals of accuracy
[01:13:50][D][sensor:093]: 'S3 Humidity': Sending state 60.02083 % with 2 decimals of accuracy
[01:13:50][D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000
[01:13:50][D][speaker_media_player.pipeline:114]: Reading FLAC file type
[01:13:50][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[01:13:50][D][ring_buffer:034]: Created ring buffer with size 9600
[01:13:50][D][speaker_mixer:312]: Starting speaker mixer
[01:13:50][D][speaker_mixer:320]: Started speaker mixer
[01:13:50][D][ring_buffer:034][speaker_task]: Created ring buffer with size 48000
[01:13:50][D][i2s_audio.speaker:117]: Starting Speaker
[01:13:50][D][i2s_audio.speaker:122]: Started Speaker
[01:13:52][D][speaker_media_player:408]: State changed to IDLE
[01:13:52][D][voice_assistant:329]: Announcement finished playing
[01:13:52][D][voice_assistant:456]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[01:13:52][D][voice_assistant:463]: Desired state set to RESPONSE_FINISHED
[01:13:52][D][micro_wake_word:360]: Starting wake word detection
[01:13:52][D][micro_wake_word:378]: State changed from STOPPED to STARTING
[01:13:52][D][voice_assistant:456]: State changed from RESPONSE_FINISHED to IDLE
[01:13:52][D][voice_assistant:463]: Desired state set to IDLE
[01:13:52][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[01:13:52][D][micro_wake_word:261]: Inference task has started, attempting to allocate memory for buffers
[01:13:52][D][micro_wake_word:266]: Inference task is running
[01:13:52][D][micro_wake_word:378]: State changed from STARTING to DETECTING_WAKE_WORD
[01:13:52][D][i2s_audio.microphone:455]: Task has started, attempting to setup I2S audio driver
[01:13:52][D][speaker_mixer:325]: Stopping speaker mixer
WARNING esp32-s3-box-3-2e8c34 @ 192.168.210.49: Connection error occurred: [Errno 104] Connection reset by peer
INFO Processing unexpected disconnect from ESPHome API for esp32-s3-box-3-2e8c34 @ 192.168.210.49
WARNING Disconnected from API
INFO Successfully connected to esp32-s3-box-3-2e8c34 @ 192.168.210.49 in 0.009s
INFO Successful handshake with esp32-s3-box-3-2e8c34 @ 192.168.210.49 in 0.165s
[01:14:39][D][sensor:093]: 'WiFi db': Sending state -48.00000 dBm with 0 decimals of accuracy
[01:14:39][D][sensor:093]: 'WiFi Signal': Sending state 100.00000 % with 0 decimals of accuracy

I have a suspicion this is related to the new features that were just released.
https://www.esphome.io/changelog/2025.5.0.html

I also noted that there is a new i2s_audio driver although I have tested the legacy mode and the issue still replicates:
esphome/esphome#8703

I can see there is pending updates to the esphome wakewordvoiceassistants to support the new features which I suspect would resolve this:
esphome/wake-word-voice-assistants#110

It looks like a refactor of the way the voice assistant is configured as you can't assume the order of an interaction is wake word->stt->tts->done

I have been trying to handle how it ends the announcement but I haven't been able to mitigate this yet so I think the media_player and voice_assistant code may need to be updated similar to the pending update from esphome. I will wait until they release their changes.

I wanted to raise this as I expect others will start reporting it soon as it appears there is a breaking change in this release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions