fix(audio): global colored logs, arecord exit detection, capture health metrics

tokk-nv · tokk-nv · commit 37bb79919667 · 2026-03-14T12:27:52.000-07:00
- Add _ColorFormatter in cli/main.py for global ANSI-colored log output
  (DEBUG=grey, WARNING=yellow, ERROR=red, CRITICAL=bold red)
- Fix arecord unexpected exit going undetected: detect proc.poll() != None
  after inner read loop and trigger retry instead of silently ending capture
- Add CaptureHealth dataclass with drop/recovery counts and outage durations
- Emit capture_dropped/recovered/gave_up timeline events from voice pipeline
- Persist capture_health in session JSON for quantitative USB analysis
- Retry aplay up to 3 times on start failure with 0.3s delay
- Speaker restart cooldown (3s) to avoid log spam during device outage
- Add todo doc for ASR stream stale after mute/unmute investigation

Made-with: Cursor
diff --git a/docs/development/todo_asr_stream_stale_after_mute.md b/docs/development/todo_asr_stream_stale_after_mute.md
@@ -0,0 +1,80 @@
+# ASR stream goes stale after mute/unmute or long silence
+
+After muting then unmuting the mic (or after a prolonged period where no speech reaches Riva), the ASR stream silently stops producing results even though PCM audio is still flowing.
+
+## Observed behavior
+
+- Session `c87be1b2` (2026-03-14): 9 turns completed successfully.
+- Turn 8 triggered a degenerate LLM reasoning loop (10,101 chars, **91.89 s** wall-clock).
+- During that wait, the user muted and later unmuted the mic.
+- After turn 9 completed, no further `asr_final` events appeared for ~1.5 min despite the green **user_amplitude** waveform being visible on the timeline (PCM capture was healthy).
+- Terminal showed no ASR errors; the stream ended normally at session close with `Stream task timeout, cancelling`.
+
+## Why amplitude shows but ASR does not
+
+In `_feed_pcm_to_pipeline`, amplitude is always computed and sent to the client (lines 1005-1024) regardless of `mic_muted`. The ASR send is gated:
+
+```python
+if not mic_muted:
+    await asr.send_audio(pcm_bytes)
+```
+
+So the timeline waveform looks alive, but if the Riva gRPC stream has internally timed out (or VAD state has gone stale after 90+ seconds of silence/mute), newly sent audio produces no results.
+
+## Probable root cause (needs confirmation)
+
+Riva Streaming ASR has internal session limits:
+- **gRPC keepalive / idle timeout**: if no audio is sent for an extended period the server may silently close the stream.
+- **VAD state**: after a long silence gap, the VAD model may reset or require a fresh trigger to start detecting speech again.
+- **Maximum session duration**: Riva may cap single-stream duration; after that, the stream yields no more results even though it stays open.
+
+The exact Riva behavior here is unconfirmed — the stream appeared open (no error logged) but stopped producing finals.
+
+## What is already in place
+
+- `mic_muted` gates `asr.send_audio()` in the classic pipeline (line 1003).
+- On mute, 0.5 s of silence is injected (`b"\x00" * int(16000 * 2 * 0.5)`) to flush any pending VAD partial (line 1041-1044).
+- On unmute, `mic_muted = False` resumes sending PCM to ASR.
+- No stream-health monitoring or automatic restart exists today.
+
+## Proposed solutions (pick one or combine)
+
+### Option A: Keep-alive noise during mute
+
+While `mic_muted` is True, instead of sending nothing, send **very low amplitude white noise** (e.g., ±10 out of ±32768) at normal cadence. This keeps the gRPC stream active and the VAD model warm without triggering false speech detection.
+
+Pros: Simplest change; no stream lifecycle management. \
+Cons: Assumes the Riva stream itself is still healthy; does not help if the stream has a hard session-duration cap.
+
+### Option B: Restart ASR stream after stale timeout
+
+Monitor elapsed time since the last `asr_final`. If no final arrives within a configurable window (e.g., 60 s while unmuted), tear down the current `RivaASRBackend` stream and create a fresh one.
+
+1. Track `_last_asr_final_time` in the turn executor; update it on every `asr_final`.
+2. In `server_capture_consumer` (or a watchdog task), check `time.time() - _last_asr_final_time > ASR_STALE_TIMEOUT`.
+3. If stale and `not mic_muted`: call `asr.stop()`, then `asr.start()` to open a fresh streaming session.
+4. Log `[asr] Stream restarted after stale timeout` at WARNING level.
+
+Pros: Covers all root causes (idle timeout, VAD reset, session-duration cap). \
+Cons: Slightly more complex; brief gap in ASR coverage during restart (~200 ms).
+
+### Option C: Proactive stream rotation
+
+After every turn (or every N turns), close and re-open the ASR stream. This preempts any session-duration limit and keeps the stream fresh.
+
+Pros: Eliminates stale state entirely. \
+Cons: Adds latency at turn boundaries; may lose a partial if speech is ongoing during rotation.
+
+## Recommendation
+
+**Option A + B combined**: send keep-alive noise during mute (A) to prevent idle timeout, and add a stale-timeout watchdog (B) as a safety net for unexpected stream failures. Option C is heavier and only needed if Riva has a hard session cap that A+B cannot address.
+
+## Diagnosis checklist (before implementing)
+
+- [ ] Confirm Riva Streaming ASR session limits: check `riva_asr` service config for `max_duration_seconds`, keepalive settings, or gRPC deadline.
+- [ ] Add a log line in `RivaASRBackend` when the gRPC response iterator ends (to distinguish "server closed stream" from "no results but stream open").
+- [ ] Reproduce by muting for 60+ s mid-session and verifying ASR stops producing results on unmute.
+
+## Effort
+
+**Small–Medium**: Option A is ~30 min (noise generator in `_feed_pcm_to_pipeline`). Option B is ~1–2 hours (watchdog task + stream restart plumbing + tests).
diff --git a/src/multi_modal_ai_studio/cli/main.py b/src/multi_modal_ai_studio/cli/main.py
@@ -14,6 +14,26 @@
 from pathlib import Path
 
 
+class _ColorFormatter(logging.Formatter):
+    """Logging formatter that adds ANSI colors to the level name."""
+
+    _COLORS = {
+        logging.DEBUG:    "\033[90m",   # grey
+        logging.INFO:     "",           # default
+        logging.WARNING:  "\033[93m",   # yellow
+        logging.ERROR:    "\033[91m",   # red
+        logging.CRITICAL: "\033[91;1m", # bold red
+    }
+    _RESET = "\033[0m"
+
+    def format(self, record: logging.LogRecord) -> str:
+        color = self._COLORS.get(record.levelno, "")
+        msg = super().format(record)
+        if color:
+            return f"{color}{msg}{self._RESET}"
+        return msg
+
+
 def main():
     """Main entry point for CLI."""
     from multi_modal_ai_studio import __version__
@@ -154,11 +174,10 @@ def main():
 
     args = parser.parse_args()
 
-    # Configure logging
-    logging.basicConfig(
-        level=getattr(logging, args.log_level),
-        format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
-    )
+    # Configure logging with colored output
+    _handler = logging.StreamHandler()
+    _handler.setFormatter(_ColorFormatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s"))
+    logging.basicConfig(level=getattr(logging, args.log_level), handlers=[_handler])
 
     logger = logging.getLogger(__name__)
     logger.info("Multi-modal AI Studio starting...")
diff --git a/src/multi_modal_ai_studio/core/session.py b/src/multi_modal_ai_studio/core/session.py
@@ -307,6 +307,7 @@ def to_dict(self) -> Dict[str, Any]:
             "audio_amplitude_history": getattr(self, "audio_amplitude_history", None) or [],
             "ttl_bands": getattr(self, "ttl_bands", None) or [],
             "app_version": getattr(self, "app_version", None) or __version__,
+            "capture_health": getattr(self, "capture_health", None),
         }
 
     def save(self, path: Path) -> None:
diff --git a/src/multi_modal_ai_studio/devices/capture.py b/src/multi_modal_ai_studio/devices/capture.py
@@ -11,7 +11,9 @@
 import queue
 import subprocess
 import threading
-from typing import Optional
+import time
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional
 
 logger = logging.getLogger(__name__)
 
@@ -26,20 +28,68 @@
 RETRY_BACKOFF_BASE = 0.5  # seconds; doubles each attempt up to a cap
 RETRY_BACKOFF_MAX = 5.0
 
+# ANSI escape codes for colored terminal output
+_RED = "\033[91m"
+_GREEN = "\033[92m"
+_YELLOW = "\033[93m"
+_RESET = "\033[0m"
+
+# Sentinel dict placed in the queue to signal capture health events.
+# Pipeline code should check `isinstance(item, dict)` before treating as PCM bytes.
+CAPTURE_EVENT_TYPE = "__capture_event__"
+
+
+def _make_capture_event(event: str, **kwargs: Any) -> Dict[str, Any]:
+    """Create a capture health event dict for the queue."""
+    d: Dict[str, Any] = {"__type__": CAPTURE_EVENT_TYPE, "event": event, "ts": time.time()}
+    d.update(kwargs)
+    return d
+
+
+def is_capture_event(item: Any) -> bool:
+    """Return True if item is a capture health event (not PCM bytes)."""
+    return isinstance(item, dict) and item.get("__type__") == CAPTURE_EVENT_TYPE
+
+
+@dataclass
+class CaptureHealth:
+    """Accumulated capture health metrics (thread-safe reads after capture ends)."""
+    device: str = ""
+    total_drops: int = 0
+    total_recoveries: int = 0
+    outages: List[Dict[str, float]] = field(default_factory=list)
+    gave_up: bool = False
+
+    def to_dict(self) -> Dict[str, Any]:
+        total_downtime = sum(o.get("duration_s", 0) for o in self.outages)
+        return {
+            "device": self.device,
+            "total_drops": self.total_drops,
+            "total_recoveries": self.total_recoveries,
+            "total_downtime_s": round(total_downtime, 3),
+            "outages": self.outages,
+            "gave_up": self.gave_up,
+        }
+
 
 def _capture_alsa(
     device: str,
     out_queue: "queue.Queue[Optional[bytes]]",
     stop_event: threading.Event,
     proc_holder: Optional[list] = None,
+    health: Optional[CaptureHealth] = None,
 ) -> None:
     """Capture from ALSA device via arecord; put PCM chunks in out_queue. Runs in thread.
     Uses plughw when device is hw:X,Y so ALSA can do sample-rate conversion (many USB mics only support 48kHz).
     If proc_holder is a list, the subprocess is stored as proc_holder[0] so the caller can terminate it to release the device quickly.
 
     Auto-restarts arecord up to MAX_CAPTURE_RETRIES times when the device
     disappears transiently (e.g. USB bus contention with a camera).
+    Sends capture health events through out_queue so the pipeline can track outages.
     """
+    if health is not None:
+        health.device = device
+
     dev = (device or "default").strip()
     if dev.startswith("hw:") and not dev.startswith("plughw:"):
         dev = "plug" + dev
@@ -48,6 +98,7 @@ def _capture_alsa(
 
     retries = 0
     ever_produced_chunk = False
+    drop_time: Optional[float] = None
 
     while not stop_event.is_set():
         logger.info("ALSA capture starting: %s (device=%s)", " ".join(cmd), device)
@@ -70,6 +121,8 @@ def _capture_alsa(
             logger.warning("Failed to start arecord for %s: %s", device, e)
             if retries >= MAX_CAPTURE_RETRIES:
                 logger.error("ALSA capture giving up after %d retries for %s", retries, device)
+                if health is not None:
+                    health.gave_up = True
                 out_queue.put(None)
                 return
             retries += 1
@@ -87,24 +140,49 @@ def _capture_alsa(
                     try:
                         err = proc.stderr.read().decode("utf-8", errors="replace").strip() if proc.stderr else ""
                         if err:
-                            logger.warning("ALSA capture read empty (device %s). arecord stderr: %s", device, err)
+                            logger.error("%sALSA capture read empty (device %s). arecord stderr: %s%s", _RED, device, err, _RESET)
                         else:
-                            logger.warning("ALSA capture read returned empty (device %s); check device/sample rate", device)
+                            logger.error("%sALSA capture read returned empty (device %s); check device/sample rate%s", _RED, device, _RESET)
                     except Exception:
-                        logger.warning("ALSA capture read returned empty (device %s)", device)
+                        logger.error("%sALSA capture read returned empty (device %s)%s", _RED, device, _RESET)
                     died_unexpectedly = True
                     break
                 if first_chunk_this_run:
                     first_chunk_this_run = False
                     if not ever_produced_chunk:
                         logger.info("ALSA first PCM chunk received from %s (%d bytes); pipeline will get amplitude", device, len(chunk))
                     else:
-                        logger.info("ALSA capture resumed from %s (%d bytes) after retry", device, len(chunk))
+                        recovery_dur = time.time() - drop_time if drop_time else 0
+                        logger.warning(
+                            "%s[capture_health] RECOVERED device %s after %.2fs outage (retry %d)%s",
+                            _GREEN, device, recovery_dur, retries, _RESET,
+                        )
+                        if health is not None:
+                            health.total_recoveries += 1
+                            if health.outages:
+                                health.outages[-1]["duration_s"] = round(recovery_dur, 3)
+                        out_queue.put(_make_capture_event(
+                            "recovered", device=device, outage_s=round(recovery_dur, 3), retry=retries,
+                        ))
+                        drop_time = None
                     retries = 0
                     ever_produced_chunk = True
                 out_queue.put(chunk)
+
+            # arecord exited on its own (proc.poll() != None) while we didn't ask it to stop
+            if not died_unexpectedly and not stop_event.is_set() and proc.poll() is not None:
+                rc = proc.returncode
+                try:
+                    err = proc.stderr.read().decode("utf-8", errors="replace").strip() if proc.stderr else ""
+                except Exception:
+                    err = ""
+                logger.error(
+                    "%sarecord exited unexpectedly for %s (rc=%s): %s%s",
+                    _RED, device, rc, err or "(no stderr)", _RESET,
+                )
+                died_unexpectedly = True
         except Exception as e:
-            logger.warning("ALSA capture read error for %s: %s", device, e)
+            logger.error("%sALSA capture read error for %s: %s%s", _RED, device, e, _RESET)
             died_unexpectedly = True
         finally:
             try:
@@ -122,28 +200,50 @@ def _capture_alsa(
             break
 
         if died_unexpectedly and retries < MAX_CAPTURE_RETRIES:
+            if drop_time is None:
+                drop_time = time.time()
             retries += 1
+            if health is not None:
+                health.total_drops += 1
+                health.outages.append({"drop_ts": round(drop_time, 3), "retry": retries, "duration_s": 0})
             delay = min(RETRY_BACKOFF_BASE * (2 ** (retries - 1)), RETRY_BACKOFF_MAX)
-            logger.warning(
-                "ALSA capture died unexpectedly for %s; retry %d/%d in %.1fs",
-                device, retries, MAX_CAPTURE_RETRIES, delay,
+            logger.error(
+                "%s[capture_health] DROPPED device %s; retry %d/%d in %.1fs%s",
+                _RED, device, retries, MAX_CAPTURE_RETRIES, delay, _RESET,
             )
+            out_queue.put(_make_capture_event(
+                "dropped", device=device, retry=retries, max_retries=MAX_CAPTURE_RETRIES,
+            ))
             stop_event.wait(delay)
             continue
 
         if died_unexpectedly:
-            logger.error("ALSA capture giving up after %d retries for %s", retries, device)
+            logger.error("%s[capture_health] GAVE UP on device %s after %d retries%s", _RED, device, retries, _RESET)
+            if health is not None:
+                health.gave_up = True
+                if health.outages:
+                    health.outages[-1]["duration_s"] = round(time.time() - drop_time, 3) if drop_time else 0
+            out_queue.put(_make_capture_event("gave_up", device=device, retries=retries))
         elif first_chunk_this_run and not ever_produced_chunk:
             try:
                 err = proc.stderr.read().decode("utf-8", errors="replace").strip() if proc.stderr else ""
                 if err:
-                    logger.warning("ALSA capture ended with no chunks (device %s). arecord stderr: %s", device, err)
+                    logger.error("%sALSA capture ended with no chunks (device %s). arecord stderr: %s%s", _RED, device, err, _RESET)
                 else:
-                    logger.warning("ALSA capture ended without sending any chunks (device %s); check arecord -D %s", device, dev)
+                    logger.error("%sALSA capture ended without sending any chunks (device %s); check arecord -D %s%s", _RED, device, dev, _RESET)
             except Exception:
-                logger.warning("ALSA capture ended without sending any chunks (device %s)", device)
+                logger.error("%sALSA capture ended without sending any chunks (device %s)%s", _RED, device, _RESET)
         break
 
+    # Log summary if any drops occurred
+    if health is not None and health.total_drops > 0:
+        summary = health.to_dict()
+        logger.warning(
+            "%s[capture_health] SESSION SUMMARY for %s: drops=%d recoveries=%d downtime=%.2fs gave_up=%s%s",
+            _YELLOW, device, summary["total_drops"], summary["total_recoveries"],
+            summary["total_downtime_s"], summary["gave_up"], _RESET,
+        )
+
     out_queue.put(None)
 
 
@@ -213,6 +313,7 @@ def start_server_mic_capture(
     out_queue: "queue.Queue[Optional[bytes]]",
     stop_event: threading.Event,
     proc_holder: Optional[list] = None,
+    health_out: Optional[list] = None,
 ) -> Optional[threading.Thread]:
     """
     Start a thread that captures from the given server audio input device and puts
@@ -224,20 +325,24 @@ def start_server_mic_capture(
         out_queue: queue to put chunks into; None is sentinel when capture ends
         stop_event: when set, capture thread should exit
         proc_holder: optional list; for ALSA, the arecord subprocess is appended so the caller can terminate it to release the device quickly
+        health_out: optional list; if provided, CaptureHealth is appended as health_out[0] for retrieval after thread exits
 
     Returns:
         The started thread, or None if capture could not be started.
     """
     if not device:
         return None
+    health = CaptureHealth(device=device or "")
     if source == "alsa":
         target = _capture_alsa
-        args = (device, out_queue, stop_event, proc_holder)
+        args = (device, out_queue, stop_event, proc_holder, health)
     elif source == "usb":
         target = _capture_pyaudio
         args = (device, out_queue, stop_event)
     else:
         return None
+    if health_out is not None:
+        health_out.append(health)
     t = threading.Thread(target=target, args=args, name="server-mic-capture", daemon=True)
     t.start()
     logger.info("Server mic capture started: %s device %s", source, device)
diff --git a/src/multi_modal_ai_studio/devices/playback.py b/src/multi_modal_ai_studio/devices/playback.py
diff --git a/src/multi_modal_ai_studio/webui/voice_pipeline.py b/src/multi_modal_ai_studio/webui/voice_pipeline.py

Original file line number	Diff line number	Diff line change
`@@ -307,6 +307,7 @@ def to_dict(self) -> Dict[str, Any]:`
`307`	`307`	`"audio_amplitude_history": getattr(self, "audio_amplitude_history", None) or [],`
`308`	`308`	`"ttl_bands": getattr(self, "ttl_bands", None) or [],`
`309`	`309`	`"app_version": getattr(self, "app_version", None) or __version__,`
	`310`	`+ "capture_health": getattr(self, "capture_health", None),`
`310`	`311`	`}`
`311`	`312`
`312`	`313`	`def save(self, path: Path) -> None:`