[exporter/prometheusremotewriteexporter] Retry transient WAL export errors and add configurable segment cache size by charanck9 · Pull Request #49383 · open-telemetry/opentelemetry-collector-contrib

charanck9 · 2026-06-30T13:33:13Z

…e_size and restart backoff

Transient errors caused data loss. When a WAL export failed, the error was treated the same regardless of cause. Transient backend failures (5xx, network errors)
were not retried against the buffered WAL data the way permanent failures were — so data that should have been redelivered once the backend recovered was effectively
dropped.
Tight retry loop on persistent WAL errors. When continuallyPopWALThenExport returned an error, run() immediately restarted the WAL with no delay. If the error was
persistent, this spun in a hot loop, burning CPU and flooding logs.
Unbounded WAL memory usage / no way to cap it. The WAL kept buffer_size segments cached in memory with no way to lower it. On large backlogs this drove high memory
consumption with no tuning knob.

Classify and retry transient errors. exportThenFrontTruncateWAL now retries the export indefinitely (with a 5s wait, cancellable via context/stop) until the
backend recovers, instead of dropping the data. Permanent errors (e.g. 4xx, detected via consumererror.IsPermanent) are skipped and truncated since retrying can't
help. While retrying, no new data is read from the WAL, which also bounds memory growth.
Backoff before WAL restart. Added a 5s backoff (cancellable via context/stopChan) before run() restarts the WAL after a processing error, preventing the tight
retry loop.
Configurable segment_cache_size. New option (default = buffer_size) controlling how many WAL segments are cached in memory. Lower values cut memory usage at the
cost of extra disk reads during replay; set to 2 for a minimal footprint. Documented in the README.

…e_size and restart backoff

Fix WAL buffered data stall: retry transient errors, add segment_cach…

f96de25

…e_size and restart backoff

charanck9 requested review from a team, ArthurSens and dashpole as code owners June 30, 2026 13:33

github-actions Bot assigned VihasMakwana Jun 30, 2026

github-actions Bot requested review from Aneurysm9, rapphil and ywwg June 30, 2026 13:33

Provide feedback