βββββββββββββββββββββββββββββββββββββββββββββββ
β Camada 5: Conversation Agent β
β OpenClaw (Quasar / Claude) β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Camada 4: Voice Pipeline β
β Home Assistant Assist β
βββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ€
β Camada 3a β Camada 3b β Camada 3c β
β STT: Whisper β TTS: Piper β WW: openWW β
β (Wyoming) β (Wyoming) β (Wyoming) β
βββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ€
β Camada 2: ESPHome Native API β
β ComunicaΓ§Γ£o ESP32 β HA β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Camada 1: Firmware ESPHome β
β ESP32-S3 (mic + speaker + LED) β
βββββββββββββββββββββββββββββββββββββββββββββββ
| Componente | Tecnologia | FunΓ§Γ£o |
|---|---|---|
| Firmware base | ESPHome | Framework de configuraΓ§Γ£o YAML, OTA, API nativa |
| Voice Assistant | voice_assistant component |
Streaming Γ‘udio β HA pipeline |
| Wake Word | micro_wake_word component |
DetecΓ§Γ£o local no ESP32 (TFLite) |
| Microfone | i2s_audio + microphone |
Captura Γ‘udio I2S do INMP441 |
| Speaker | i2s_audio + speaker |
ReproduΓ§Γ£o Γ‘udio I2S via MAX98357A |
| LED | light + neopixelbus |
Feedback visual WS2812B |
| Wi-Fi | wifi component |
ConexΓ£o Γ rede local |
| Logger | logger component |
Debug via USB serial |
| Componente | Tecnologia | Porta | FunΓ§Γ£o |
|---|---|---|---|
| Home Assistant | HA Core | :8123 | Orquestrador central + Voice Pipeline |
| Whisper | whisper.cpp / faster-whisper | Wyoming | Speech-to-Text local |
| Piper | piper-tts | Wyoming | Text-to-Speech neural local |
| openWakeWord | openWakeWord | Wyoming | Wake word backup (treinamento custom) |
| OpenClaw | Clawdbot | β | Conversation Agent (Claude API) |
| HA Integrations | Wyoming + OpenAI Conv. | β | Cola entre componentes |
- Wyoming Protocol β Conecta Whisper, Piper e openWakeWord
- ESPHome β Conecta os QuasarBox satellites
- OpenAI Conversation (ou custom) β Conversation agent apontando pro OpenClaw
# ConfiguraΓ§Γ£o via UI do HA: Settings β Voice Assistants
# Pipeline "Quasar":
# - STT: Whisper (Wyoming)
# - Conversation Agent: OpenClaw (custom/OpenAI-compatible)
# - TTS: Piper (Wyoming - pt-BR)
# - Wake Word: openWakeWord (Wyoming)O Wyoming protocol roda cada componente como um serviΓ§o TCP independente:
ββββββββββββββββββββββββββββββββββββ
β faster-whisper (Wyoming server) β :10300
β Modelo: small / medium β
β LΓngua: pt-BR β
ββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββ
β piper-tts (Wyoming server) β :10200
β Voz: pt_BR-faber-medium β
ββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββ
β openWakeWord (Wyoming server) β :10400
β Modelo: custom "ei_quasar" β
ββββββββββββββββββββββββββββββββββββ
O HA tem suporte nativo a conversation agents via integraΓ§Γ£o OpenAI Conversation, que aceita qualquer API compatΓvel com o formato OpenAI Chat Completions.
O OpenClaw expΓ΅e (ou pode expor) um endpoint compatΓvel. O fluxo:
Voice Pipeline β STT β texto
β Conversation Agent (OpenClaw API)
β Claude interpreta o comando
β Chama HA API se necessΓ‘rio (tools/function calling)
β Retorna texto de resposta
β TTS β Γ‘udio
β Volta pro ESP32
Vantagens sobre Assist nativo:
- Entende linguagem natural complexa ("tΓ‘ um forno aqui")
- MantΓ©m contexto da conversa
- Pode executar aΓ§Γ΅es compostas ("modo filme")
- Integra com serviΓ§os externos (Γrbita, TV, etc.)
Se a integraΓ§Γ£o nativa nΓ£o for suficiente, existe o Extended OpenAI Conversation via HACS que suporta:
- Function calling (chamar serviΓ§os HA)
- Prompt templates
- Qualquer API OpenAI-compatible
# Whisper (jΓ‘ instalado)
whisper-cpp ou faster-whisper
# Piper TTS
piper-tts
# openWakeWord
openwakeword
# Wyoming servers
wyoming-faster-whisper
wyoming-piper
wyoming-openwakeword
esphome >= 2024.2.0
| Componente | VersΓ£o mΓnima |
|---|---|
| Home Assistant | 2024.2+ (voice pipeline v2) |
| ESPHome | 2024.2+ (voice_assistant v2, micro_wake_word) |
| Whisper | large-v3 / medium (pt-BR) |
| Piper | 1.2+ (pt_BR voices) |
| Python (servidor) | 3.10+ |