Skip to content

Commit 04e5b84

Browse files
authored
Merge pull request #34 from tokk-nv/fix/audio-retries-timeline
fix(audio): ALSA retries with capture health metrics, global colored logs, arecord exit detection, and timeline event matching
2 parents d08e3ba + 37bb799 commit 04e5b84

7 files changed

Lines changed: 446 additions & 93 deletions

File tree

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# ASR stream goes stale after mute/unmute or long silence
2+
3+
After muting then unmuting the mic (or after a prolonged period where no speech reaches Riva), the ASR stream silently stops producing results even though PCM audio is still flowing.
4+
5+
## Observed behavior
6+
7+
- Session `c87be1b2` (2026-03-14): 9 turns completed successfully.
8+
- Turn 8 triggered a degenerate LLM reasoning loop (10,101 chars, **91.89 s** wall-clock).
9+
- During that wait, the user muted and later unmuted the mic.
10+
- After turn 9 completed, no further `asr_final` events appeared for ~1.5 min despite the green **user_amplitude** waveform being visible on the timeline (PCM capture was healthy).
11+
- Terminal showed no ASR errors; the stream ended normally at session close with `Stream task timeout, cancelling`.
12+
13+
## Why amplitude shows but ASR does not
14+
15+
In `_feed_pcm_to_pipeline`, amplitude is always computed and sent to the client (lines 1005-1024) regardless of `mic_muted`. The ASR send is gated:
16+
17+
```python
18+
if not mic_muted:
19+
await asr.send_audio(pcm_bytes)
20+
```
21+
22+
So the timeline waveform looks alive, but if the Riva gRPC stream has internally timed out (or VAD state has gone stale after 90+ seconds of silence/mute), newly sent audio produces no results.
23+
24+
## Probable root cause (needs confirmation)
25+
26+
Riva Streaming ASR has internal session limits:
27+
- **gRPC keepalive / idle timeout**: if no audio is sent for an extended period the server may silently close the stream.
28+
- **VAD state**: after a long silence gap, the VAD model may reset or require a fresh trigger to start detecting speech again.
29+
- **Maximum session duration**: Riva may cap single-stream duration; after that, the stream yields no more results even though it stays open.
30+
31+
The exact Riva behavior here is unconfirmed — the stream appeared open (no error logged) but stopped producing finals.
32+
33+
## What is already in place
34+
35+
- `mic_muted` gates `asr.send_audio()` in the classic pipeline (line 1003).
36+
- On mute, 0.5 s of silence is injected (`b"\x00" * int(16000 * 2 * 0.5)`) to flush any pending VAD partial (line 1041-1044).
37+
- On unmute, `mic_muted = False` resumes sending PCM to ASR.
38+
- No stream-health monitoring or automatic restart exists today.
39+
40+
## Proposed solutions (pick one or combine)
41+
42+
### Option A: Keep-alive noise during mute
43+
44+
While `mic_muted` is True, instead of sending nothing, send **very low amplitude white noise** (e.g., ±10 out of ±32768) at normal cadence. This keeps the gRPC stream active and the VAD model warm without triggering false speech detection.
45+
46+
Pros: Simplest change; no stream lifecycle management. \
47+
Cons: Assumes the Riva stream itself is still healthy; does not help if the stream has a hard session-duration cap.
48+
49+
### Option B: Restart ASR stream after stale timeout
50+
51+
Monitor elapsed time since the last `asr_final`. If no final arrives within a configurable window (e.g., 60 s while unmuted), tear down the current `RivaASRBackend` stream and create a fresh one.
52+
53+
1. Track `_last_asr_final_time` in the turn executor; update it on every `asr_final`.
54+
2. In `server_capture_consumer` (or a watchdog task), check `time.time() - _last_asr_final_time > ASR_STALE_TIMEOUT`.
55+
3. If stale and `not mic_muted`: call `asr.stop()`, then `asr.start()` to open a fresh streaming session.
56+
4. Log `[asr] Stream restarted after stale timeout` at WARNING level.
57+
58+
Pros: Covers all root causes (idle timeout, VAD reset, session-duration cap). \
59+
Cons: Slightly more complex; brief gap in ASR coverage during restart (~200 ms).
60+
61+
### Option C: Proactive stream rotation
62+
63+
After every turn (or every N turns), close and re-open the ASR stream. This preempts any session-duration limit and keeps the stream fresh.
64+
65+
Pros: Eliminates stale state entirely. \
66+
Cons: Adds latency at turn boundaries; may lose a partial if speech is ongoing during rotation.
67+
68+
## Recommendation
69+
70+
**Option A + B combined**: send keep-alive noise during mute (A) to prevent idle timeout, and add a stale-timeout watchdog (B) as a safety net for unexpected stream failures. Option C is heavier and only needed if Riva has a hard session cap that A+B cannot address.
71+
72+
## Diagnosis checklist (before implementing)
73+
74+
- [ ] Confirm Riva Streaming ASR session limits: check `riva_asr` service config for `max_duration_seconds`, keepalive settings, or gRPC deadline.
75+
- [ ] Add a log line in `RivaASRBackend` when the gRPC response iterator ends (to distinguish "server closed stream" from "no results but stream open").
76+
- [ ] Reproduce by muting for 60+ s mid-session and verifying ASR stops producing results on unmute.
77+
78+
## Effort
79+
80+
**Small–Medium**: Option A is ~30 min (noise generator in `_feed_pcm_to_pipeline`). Option B is ~1–2 hours (watchdog task + stream restart plumbing + tests).

src/multi_modal_ai_studio/cli/main.py

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,26 @@
1414
from pathlib import Path
1515

1616

17+
class _ColorFormatter(logging.Formatter):
18+
"""Logging formatter that adds ANSI colors to the level name."""
19+
20+
_COLORS = {
21+
logging.DEBUG: "\033[90m", # grey
22+
logging.INFO: "", # default
23+
logging.WARNING: "\033[93m", # yellow
24+
logging.ERROR: "\033[91m", # red
25+
logging.CRITICAL: "\033[91;1m", # bold red
26+
}
27+
_RESET = "\033[0m"
28+
29+
def format(self, record: logging.LogRecord) -> str:
30+
color = self._COLORS.get(record.levelno, "")
31+
msg = super().format(record)
32+
if color:
33+
return f"{color}{msg}{self._RESET}"
34+
return msg
35+
36+
1737
def main():
1838
"""Main entry point for CLI."""
1939
from multi_modal_ai_studio import __version__
@@ -154,11 +174,10 @@ def main():
154174

155175
args = parser.parse_args()
156176

157-
# Configure logging
158-
logging.basicConfig(
159-
level=getattr(logging, args.log_level),
160-
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
161-
)
177+
# Configure logging with colored output
178+
_handler = logging.StreamHandler()
179+
_handler.setFormatter(_ColorFormatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s"))
180+
logging.basicConfig(level=getattr(logging, args.log_level), handlers=[_handler])
162181

163182
logger = logging.getLogger(__name__)
164183
logger.info("Multi-modal AI Studio starting...")

src/multi_modal_ai_studio/core/session.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,7 @@ def to_dict(self) -> Dict[str, Any]:
307307
"audio_amplitude_history": getattr(self, "audio_amplitude_history", None) or [],
308308
"ttl_bands": getattr(self, "ttl_bands", None) or [],
309309
"app_version": getattr(self, "app_version", None) or __version__,
310+
"capture_health": getattr(self, "capture_health", None),
310311
}
311312

312313
def save(self, path: Path) -> None:

0 commit comments

Comments
 (0)