|
| 1 | +# ASR stream goes stale after mute/unmute or long silence |
| 2 | + |
| 3 | +After muting then unmuting the mic (or after a prolonged period where no speech reaches Riva), the ASR stream silently stops producing results even though PCM audio is still flowing. |
| 4 | + |
| 5 | +## Observed behavior |
| 6 | + |
| 7 | +- Session `c87be1b2` (2026-03-14): 9 turns completed successfully. |
| 8 | +- Turn 8 triggered a degenerate LLM reasoning loop (10,101 chars, **91.89 s** wall-clock). |
| 9 | +- During that wait, the user muted and later unmuted the mic. |
| 10 | +- After turn 9 completed, no further `asr_final` events appeared for ~1.5 min despite the green **user_amplitude** waveform being visible on the timeline (PCM capture was healthy). |
| 11 | +- Terminal showed no ASR errors; the stream ended normally at session close with `Stream task timeout, cancelling`. |
| 12 | + |
| 13 | +## Why amplitude shows but ASR does not |
| 14 | + |
| 15 | +In `_feed_pcm_to_pipeline`, amplitude is always computed and sent to the client (lines 1005-1024) regardless of `mic_muted`. The ASR send is gated: |
| 16 | + |
| 17 | +```python |
| 18 | +if not mic_muted: |
| 19 | + await asr.send_audio(pcm_bytes) |
| 20 | +``` |
| 21 | + |
| 22 | +So the timeline waveform looks alive, but if the Riva gRPC stream has internally timed out (or VAD state has gone stale after 90+ seconds of silence/mute), newly sent audio produces no results. |
| 23 | + |
| 24 | +## Probable root cause (needs confirmation) |
| 25 | + |
| 26 | +Riva Streaming ASR has internal session limits: |
| 27 | +- **gRPC keepalive / idle timeout**: if no audio is sent for an extended period the server may silently close the stream. |
| 28 | +- **VAD state**: after a long silence gap, the VAD model may reset or require a fresh trigger to start detecting speech again. |
| 29 | +- **Maximum session duration**: Riva may cap single-stream duration; after that, the stream yields no more results even though it stays open. |
| 30 | + |
| 31 | +The exact Riva behavior here is unconfirmed — the stream appeared open (no error logged) but stopped producing finals. |
| 32 | + |
| 33 | +## What is already in place |
| 34 | + |
| 35 | +- `mic_muted` gates `asr.send_audio()` in the classic pipeline (line 1003). |
| 36 | +- On mute, 0.5 s of silence is injected (`b"\x00" * int(16000 * 2 * 0.5)`) to flush any pending VAD partial (line 1041-1044). |
| 37 | +- On unmute, `mic_muted = False` resumes sending PCM to ASR. |
| 38 | +- No stream-health monitoring or automatic restart exists today. |
| 39 | + |
| 40 | +## Proposed solutions (pick one or combine) |
| 41 | + |
| 42 | +### Option A: Keep-alive noise during mute |
| 43 | + |
| 44 | +While `mic_muted` is True, instead of sending nothing, send **very low amplitude white noise** (e.g., ±10 out of ±32768) at normal cadence. This keeps the gRPC stream active and the VAD model warm without triggering false speech detection. |
| 45 | + |
| 46 | +Pros: Simplest change; no stream lifecycle management. \ |
| 47 | +Cons: Assumes the Riva stream itself is still healthy; does not help if the stream has a hard session-duration cap. |
| 48 | + |
| 49 | +### Option B: Restart ASR stream after stale timeout |
| 50 | + |
| 51 | +Monitor elapsed time since the last `asr_final`. If no final arrives within a configurable window (e.g., 60 s while unmuted), tear down the current `RivaASRBackend` stream and create a fresh one. |
| 52 | + |
| 53 | +1. Track `_last_asr_final_time` in the turn executor; update it on every `asr_final`. |
| 54 | +2. In `server_capture_consumer` (or a watchdog task), check `time.time() - _last_asr_final_time > ASR_STALE_TIMEOUT`. |
| 55 | +3. If stale and `not mic_muted`: call `asr.stop()`, then `asr.start()` to open a fresh streaming session. |
| 56 | +4. Log `[asr] Stream restarted after stale timeout` at WARNING level. |
| 57 | + |
| 58 | +Pros: Covers all root causes (idle timeout, VAD reset, session-duration cap). \ |
| 59 | +Cons: Slightly more complex; brief gap in ASR coverage during restart (~200 ms). |
| 60 | + |
| 61 | +### Option C: Proactive stream rotation |
| 62 | + |
| 63 | +After every turn (or every N turns), close and re-open the ASR stream. This preempts any session-duration limit and keeps the stream fresh. |
| 64 | + |
| 65 | +Pros: Eliminates stale state entirely. \ |
| 66 | +Cons: Adds latency at turn boundaries; may lose a partial if speech is ongoing during rotation. |
| 67 | + |
| 68 | +## Recommendation |
| 69 | + |
| 70 | +**Option A + B combined**: send keep-alive noise during mute (A) to prevent idle timeout, and add a stale-timeout watchdog (B) as a safety net for unexpected stream failures. Option C is heavier and only needed if Riva has a hard session cap that A+B cannot address. |
| 71 | + |
| 72 | +## Diagnosis checklist (before implementing) |
| 73 | + |
| 74 | +- [ ] Confirm Riva Streaming ASR session limits: check `riva_asr` service config for `max_duration_seconds`, keepalive settings, or gRPC deadline. |
| 75 | +- [ ] Add a log line in `RivaASRBackend` when the gRPC response iterator ends (to distinguish "server closed stream" from "no results but stream open"). |
| 76 | +- [ ] Reproduce by muting for 60+ s mid-session and verifying ASR stops producing results on unmute. |
| 77 | + |
| 78 | +## Effort |
| 79 | + |
| 80 | +**Small–Medium**: Option A is ~30 min (noise generator in `_feed_pcm_to_pipeline`). Option B is ~1–2 hours (watchdog task + stream restart plumbing + tests). |
0 commit comments