You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Continuous low-level background audio (office-ambience bed mixed under an AI agent's voice) is delivered intact to WebRTC subscribers but arrives on the PSTN leg with 55–64% of its frames replaced by digital silence, in a metronomic pattern (~100–150 ms of audio passing every ~600–750 ms). Agent speech is never affected — only the quiet continuous bed between utterances. Callers hear the background "pumping" in and out whenever the agent stops talking.
Tap A contains unbroken 5+ second stretches of continuous bed. Tap B never has more than ~150 ms of bed at a time outside agent speech; the rest is digital silence (-90 dBFS), alternating with the bed in a regular ~750 ms cycle. Levels confirm Tap A is the real bed (median -41.8 dBFS tracking the asset's dynamics), not decoder comfort noise.
Three separate probe calls reproduce the same pattern at the same magnitude.
Codec-independent
Forcing the SIP leg to PCMU via the dispatch rule's media config (only_listed_codecs: true, codecs: [PCMU/8000]) changed nothing (61% vs 64% dropout), so this is not Opus DTX on the SIP leg.
Where it seems to come from
The behavior matches the mixer input state machine in media-sdk (mixer/mixer.go):
func (i*Input) readSample(bufMinint, out msdk.PCM16Sample) (int, error) {
ifi.buffering {
ifi.buf.Len() <bufMin {
return0, nil// keep buffering -> mixer emits silence for this input
}
i.buffering=false
}
n, err:=i.buf.Read(out)
ifn==0 {
i.buffering=true// starving; pause the input and start buffering again
}
...
}
A momentary starvation mutes the input until bufMin re-accumulates, then it plays briefly and starves again — i.e. a short producer hiccup is amplified into a repeating mute/burst cycle on the phone leg. During TTS speech the upstream buffers are full (TTS delivers faster than real time), which would explain why speech never chops while the just-in-time-paced bed does.
Summary
Continuous low-level background audio (office-ambience bed mixed under an AI agent's voice) is delivered intact to WebRTC subscribers but arrives on the PSTN leg with 55–64% of its frames replaced by digital silence, in a metronomic pattern (~100–150 ms of audio passing every ~600–750 ms). Agent speech is never affected — only the quiet continuous bed between utterances. Callers hear the background "pumping" in and out whenever the agent stops talking.
Setup
TrackPublishOptionswithdtx: falsedefaults; track carries TTS speech + a constant ambience bed (~-38 dBFS effective, never below -45 dBFS in any 50 ms window — measured)krisp_enabled: false, no media encryptiondispatch_rule_individualEvidence (same-call dual tap)
For a single PSTN call we recorded simultaneously:
50 ms RMS windows, "bed" = -60..-30 dBFS, "dropout" = below -60 dBFS:
Tap A contains unbroken 5+ second stretches of continuous bed. Tap B never has more than ~150 ms of bed at a time outside agent speech; the rest is digital silence (-90 dBFS), alternating with the bed in a regular ~750 ms cycle. Levels confirm Tap A is the real bed (median -41.8 dBFS tracking the asset's dynamics), not decoder comfort noise.
Three separate probe calls reproduce the same pattern at the same magnitude.
Codec-independent
Forcing the SIP leg to PCMU via the dispatch rule's
mediaconfig (only_listed_codecs: true, codecs: [PCMU/8000]) changed nothing (61% vs 64% dropout), so this is not Opus DTX on the SIP leg.Where it seems to come from
The behavior matches the mixer input state machine in media-sdk (
mixer/mixer.go):A momentary starvation mutes the input until
bufMinre-accumulates, then it plays briefly and starves again — i.e. a short producer hiccup is amplified into a repeating mute/burst cycle on the phone leg. During TTS speech the upstream buffers are full (TTS delivers faster than real time), which would explain why speech never chops while the just-in-time-paced bed does.Questions