You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(audio): global colored logs, arecord exit detection, capture health metrics
- Add _ColorFormatter in cli/main.py for global ANSI-colored log output
(DEBUG=grey, WARNING=yellow, ERROR=red, CRITICAL=bold red)
- Fix arecord unexpected exit going undetected: detect proc.poll() != None
after inner read loop and trigger retry instead of silently ending capture
- Add CaptureHealth dataclass with drop/recovery counts and outage durations
- Emit capture_dropped/recovered/gave_up timeline events from voice pipeline
- Persist capture_health in session JSON for quantitative USB analysis
- Retry aplay up to 3 times on start failure with 0.3s delay
- Speaker restart cooldown (3s) to avoid log spam during device outage
- Add todo doc for ASR stream stale after mute/unmute investigation
Made-with: Cursor
# ASR stream goes stale after mute/unmute or long silence
2
+
3
+
After muting then unmuting the mic (or after a prolonged period where no speech reaches Riva), the ASR stream silently stops producing results even though PCM audio is still flowing.
- During that wait, the user muted and later unmuted the mic.
10
+
- After turn 9 completed, no further `asr_final` events appeared for ~1.5 min despite the green **user_amplitude** waveform being visible on the timeline (PCM capture was healthy).
11
+
- Terminal showed no ASR errors; the stream ended normally at session close with `Stream task timeout, cancelling`.
12
+
13
+
## Why amplitude shows but ASR does not
14
+
15
+
In `_feed_pcm_to_pipeline`, amplitude is always computed and sent to the client (lines 1005-1024) regardless of `mic_muted`. The ASR send is gated:
16
+
17
+
```python
18
+
ifnot mic_muted:
19
+
await asr.send_audio(pcm_bytes)
20
+
```
21
+
22
+
So the timeline waveform looks alive, but if the Riva gRPC stream has internally timed out (or VAD state has gone stale after 90+ seconds of silence/mute), newly sent audio produces no results.
23
+
24
+
## Probable root cause (needs confirmation)
25
+
26
+
Riva Streaming ASR has internal session limits:
27
+
-**gRPC keepalive / idle timeout**: if no audio is sent for an extended period the server may silently close the stream.
28
+
-**VAD state**: after a long silence gap, the VAD model may reset or require a fresh trigger to start detecting speech again.
29
+
-**Maximum session duration**: Riva may cap single-stream duration; after that, the stream yields no more results even though it stays open.
30
+
31
+
The exact Riva behavior here is unconfirmed — the stream appeared open (no error logged) but stopped producing finals.
32
+
33
+
## What is already in place
34
+
35
+
-`mic_muted` gates `asr.send_audio()` in the classic pipeline (line 1003).
36
+
- On mute, 0.5 s of silence is injected (`b"\x00" * int(16000 * 2 * 0.5)`) to flush any pending VAD partial (line 1041-1044).
37
+
- On unmute, `mic_muted = False` resumes sending PCM to ASR.
38
+
- No stream-health monitoring or automatic restart exists today.
39
+
40
+
## Proposed solutions (pick one or combine)
41
+
42
+
### Option A: Keep-alive noise during mute
43
+
44
+
While `mic_muted` is True, instead of sending nothing, send **very low amplitude white noise** (e.g., ±10 out of ±32768) at normal cadence. This keeps the gRPC stream active and the VAD model warm without triggering false speech detection.
45
+
46
+
Pros: Simplest change; no stream lifecycle management. \
47
+
Cons: Assumes the Riva stream itself is still healthy; does not help if the stream has a hard session-duration cap.
48
+
49
+
### Option B: Restart ASR stream after stale timeout
50
+
51
+
Monitor elapsed time since the last `asr_final`. If no final arrives within a configurable window (e.g., 60 s while unmuted), tear down the current `RivaASRBackend` stream and create a fresh one.
52
+
53
+
1. Track `_last_asr_final_time` in the turn executor; update it on every `asr_final`.
54
+
2. In `server_capture_consumer` (or a watchdog task), check `time.time() - _last_asr_final_time > ASR_STALE_TIMEOUT`.
55
+
3. If stale and `not mic_muted`: call `asr.stop()`, then `asr.start()` to open a fresh streaming session.
56
+
4. Log `[asr] Stream restarted after stale timeout` at WARNING level.
57
+
58
+
Pros: Covers all root causes (idle timeout, VAD reset, session-duration cap). \
59
+
Cons: Slightly more complex; brief gap in ASR coverage during restart (~200 ms).
60
+
61
+
### Option C: Proactive stream rotation
62
+
63
+
After every turn (or every N turns), close and re-open the ASR stream. This preempts any session-duration limit and keeps the stream fresh.
64
+
65
+
Pros: Eliminates stale state entirely. \
66
+
Cons: Adds latency at turn boundaries; may lose a partial if speech is ongoing during rotation.
67
+
68
+
## Recommendation
69
+
70
+
**Option A + B combined**: send keep-alive noise during mute (A) to prevent idle timeout, and add a stale-timeout watchdog (B) as a safety net for unexpected stream failures. Option C is heavier and only needed if Riva has a hard session cap that A+B cannot address.
71
+
72
+
## Diagnosis checklist (before implementing)
73
+
74
+
-[ ] Confirm Riva Streaming ASR session limits: check `riva_asr` service config for `max_duration_seconds`, keepalive settings, or gRPC deadline.
75
+
-[ ] Add a log line in `RivaASRBackend` when the gRPC response iterator ends (to distinguish "server closed stream" from "no results but stream open").
76
+
-[ ] Reproduce by muting for 60+ s mid-session and verifying ASR stops producing results on unmute.
77
+
78
+
## Effort
79
+
80
+
**Small–Medium**: Option A is ~30 min (noise generator in `_feed_pcm_to_pipeline`). Option B is ~1–2 hours (watchdog task + stream restart plumbing + tests).
0 commit comments