Summary
On budget H.265 + G.711 (PCMA/8000) cameras relayed by go2rtc over RTSP, a long-running RTSP consumer of the relayed stream progressively loses its audio: AAC/PCMA packets per 10 s of output slide from ~78 down to ~1 over several minutes and then stay at ~1 (effectively silent) for hours, while video is unaffected. A freshly-connected consumer on the same stream is fine; a consumer reading the camera directly (bypassing go2rtc) is also unaffected.
I haven't proven the internal cause yet (instrumented measurement in progress — see below), so I've kept this report to what's observable and reproducible and put the mechanism as a hypothesis. Happy to gather whatever would help.
Environment
- go2rtc 1.9.10 (as bundled in Frigate 0.17.1),
linux/amd64, Docker on a WSL2 host.
- ~12 identical budget cams: 4K H.265 video + G.711 A-law (PCMA, 8 kHz mono) audio,
rtsp://user:pass@ip:554/ch0_0.h265.
- go2rtc stream is a pure relay (no transcode):
cam_raw: rtsp://user:pass@ip:554/ch0_0.h265; consumers read rtsp://127.0.0.1:8554/cam_raw?video&audio.
- Consumers are long-running
ffmpeg -c:v copy -c:a aac -f segment recorders (Frigate's recorder + our own test recorders).
What's observed (reproducible)
-
It's the long-running consumer, not the camera. While a recording consumer is stuck at ~1 audio pkt/segment, the go2rtc API shows the producer's PCMA receiver byte/packet counters still advancing normally, and a fresh short ffmpeg/ffprobe probe of the same relayed URL decodes real, audible audio (~48 kHz·s of samples in an 8 s probe, mean ≈ −37 dB). So the relay is still receiving and can still serve audio; only the established consumer's audio has collapsed.
-
Independent consumers collapse — and recover — in lockstep. Two separate ffmpeg recorders of the same relayed stream (different ffmpeg versions, different process ages) slid 78→1 within the same ~1–2 minute window, and on another occasion both returned to ~78 in the same minute without reconnecting. A per-consumer clock-drift explanation doesn't fit a simultaneous spontaneous recovery; it points at something shared on the relay/producer side.
-
A camera-direct consumer is immune. The identical ffmpeg -c:v copy -c:a aac pipeline reading the camera's RTSP directly (not via go2rtc) ran 4 h through several of these windows with steady ~78 audio pkts/segment. A direct ffmpeg client re-bases RTP timing from the camera's RTCP Sender Reports; the go2rtc-relayed consumers do not appear to get that benefit.
-
Frequency: across one 48 h span we logged 62 decay episodes over ~12 cameras (segment audio-packet count ≤5), median duration ~3.3 h (max ~15 h). So it's frequent and long-lived, not a rare blip.
Hypothesis (not yet proven — would value your read)
These cheap cameras occasionally emit an audio RTP timestamp discontinuity. A direct ffmpeg client absorbs it (RTCP SR re-basing); the relay appears to pass the broken audio/video timestamp relationship through to its already-attached consumers, whose muxers then progressively starve the audio once the A/V gap exceeds their interleave tolerance — while newly-attached consumers negotiate fresh timing and are fine.
From a read of the v1.9.10 source, the RTSP consumer path (pkg/rtsp/consumer.go packetWriter) copies packet.Timestamp straight through, and the RTSP server side doesn't appear to emit RTCP Sender Reports to consumers — so a consumer has nothing to re-base against when the producer's audio timeline jumps. I want to confirm this with direct instrumentation before claiming it.
What I'm doing / what would help
I'm building an instrumented v1.9.10 to log the actual RTP timestamps go2rtc receives from the camera vs. what it relays to each consumer, captured across a real decay window — I'll follow up here with that data. If there's a preferred place to add that logging, or if this is a known/duplicate area, I'd appreciate a pointer. If the fix is "the relay should re-base/regenerate timestamps (or emit RTCP SRs) per consumer," I'm happy to put up a PR once I've confirmed the mechanism.
Summary
On budget H.265 + G.711 (PCMA/8000) cameras relayed by go2rtc over RTSP, a long-running RTSP consumer of the relayed stream progressively loses its audio: AAC/PCMA packets per 10 s of output slide from ~78 down to ~1 over several minutes and then stay at ~1 (effectively silent) for hours, while video is unaffected. A freshly-connected consumer on the same stream is fine; a consumer reading the camera directly (bypassing go2rtc) is also unaffected.
I haven't proven the internal cause yet (instrumented measurement in progress — see below), so I've kept this report to what's observable and reproducible and put the mechanism as a hypothesis. Happy to gather whatever would help.
Environment
linux/amd64, Docker on a WSL2 host.rtsp://user:pass@ip:554/ch0_0.h265.cam_raw: rtsp://user:pass@ip:554/ch0_0.h265; consumers readrtsp://127.0.0.1:8554/cam_raw?video&audio.ffmpeg -c:v copy -c:a aac -f segmentrecorders (Frigate's recorder + our own test recorders).What's observed (reproducible)
It's the long-running consumer, not the camera. While a recording consumer is stuck at ~1 audio pkt/segment, the go2rtc API shows the producer's PCMA receiver byte/packet counters still advancing normally, and a fresh short
ffmpeg/ffprobeprobe of the same relayed URL decodes real, audible audio (~48 kHz·s of samples in an 8 s probe, mean ≈ −37 dB). So the relay is still receiving and can still serve audio; only the established consumer's audio has collapsed.Independent consumers collapse — and recover — in lockstep. Two separate
ffmpegrecorders of the same relayed stream (different ffmpeg versions, different process ages) slid 78→1 within the same ~1–2 minute window, and on another occasion both returned to ~78 in the same minute without reconnecting. A per-consumer clock-drift explanation doesn't fit a simultaneous spontaneous recovery; it points at something shared on the relay/producer side.A camera-direct consumer is immune. The identical
ffmpeg -c:v copy -c:a aacpipeline reading the camera's RTSP directly (not via go2rtc) ran 4 h through several of these windows with steady ~78 audio pkts/segment. A direct ffmpeg client re-bases RTP timing from the camera's RTCP Sender Reports; the go2rtc-relayed consumers do not appear to get that benefit.Frequency: across one 48 h span we logged 62 decay episodes over ~12 cameras (segment audio-packet count ≤5), median duration ~3.3 h (max ~15 h). So it's frequent and long-lived, not a rare blip.
Hypothesis (not yet proven — would value your read)
These cheap cameras occasionally emit an audio RTP timestamp discontinuity. A direct ffmpeg client absorbs it (RTCP SR re-basing); the relay appears to pass the broken audio/video timestamp relationship through to its already-attached consumers, whose muxers then progressively starve the audio once the A/V gap exceeds their interleave tolerance — while newly-attached consumers negotiate fresh timing and are fine.
From a read of the v1.9.10 source, the RTSP consumer path (
pkg/rtsp/consumer.gopacketWriter) copiespacket.Timestampstraight through, and the RTSP server side doesn't appear to emit RTCP Sender Reports to consumers — so a consumer has nothing to re-base against when the producer's audio timeline jumps. I want to confirm this with direct instrumentation before claiming it.What I'm doing / what would help
I'm building an instrumented v1.9.10 to log the actual RTP timestamps go2rtc receives from the camera vs. what it relays to each consumer, captured across a real decay window — I'll follow up here with that data. If there's a preferred place to add that logging, or if this is a known/duplicate area, I'd appreciate a pointer. If the fix is "the relay should re-base/regenerate timestamps (or emit RTCP SRs) per consumer," I'm happy to put up a PR once I've confirmed the mechanism.