Skip to content

Commit 11591cd

Browse files
mauri870mergify[bot]
authored andcommitted
docs(otel): document delivery guarantees for OTel mode (#11560)
When running in OTel mode, delivery guarantees for Beats receivers are only possible with a specific combination of retry settings, the sending queue, and Beats queue options, so document that. (cherry picked from commit be1ae00) # Conflicts: # docs/hybrid-agent-beats-receivers.md
1 parent 34a381b commit 11591cd

File tree

1 file changed

+49
-0
lines changed

1 file changed

+49
-0
lines changed

docs/hybrid-agent-beats-receivers.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,8 +155,12 @@ receivers:
155155
paths:
156156
- /var/log/*.log
157157
type: filestream
158+
<<<<<<< HEAD
158159
output:
159160
otelconsumer: {}
161+
=======
162+
queue.mem.flush.timeout: 0s
163+
>>>>>>> be1ae0059 (docs(otel): document delivery guarantees for OTel mode (#11560))
160164
metricbeatreceiver:
161165
metricbeat:
162166
modules:
@@ -166,8 +170,12 @@ receivers:
166170
metricsets:
167171
- cpu
168172
module: system
173+
<<<<<<< HEAD
169174
output:
170175
otelconsumer: {}
176+
=======
177+
queue.mem.flush.timeout: 0s
178+
>>>>>>> be1ae0059 (docs(otel): document delivery guarantees for OTel mode (#11560))
171179
exporters:
172180
elasticsearch/_agent-component/default:
173181
api_key: placeholder
@@ -180,6 +188,24 @@ exporters:
180188
enabled: true
181189
mapping:
182190
mode: bodymap
191+
192+
retry:
193+
enabled: true
194+
initial_interval: 1s
195+
max_interval: 1m0s
196+
max_retries: 3
197+
198+
sending_queue:
199+
enabled: true
200+
wait_for_result: true
201+
block_on_overflow: true
202+
num_consumers: 1
203+
queue_size: 3200
204+
batch:
205+
max_size: 1600
206+
min_size: 0
207+
flush_timeout: 10s
208+
sizer: items
183209
service:
184210
pipelines:
185211
logs:
@@ -189,3 +215,26 @@ service:
189215
- filebeatreceiver
190216
- metricbeatreceiver
191217
```
218+
219+
### Beats receivers delivery guarantees in OTel mode
220+
221+
When Beat receivers are used in OTel mode, event delivery guarantees depend on the configuration of the OpenTelemetry Collector `sending_queue` and retry settings.
222+
Unlike standalone Beats, the EDOT pipeline allows users to customize queue behavior through the Collector configuration.
223+
This flexibility is useful, but it also means that not every option combination is compatible with reliable delivery.
224+
225+
Elastic Agent in OTel mode provides an **at least once** delivery guarantee for Beat receivers **only when using the supported `sending_queue` settings described below**.
226+
These settings mirror Beats pipeline behavior closely enough to preserve durability expectations.
227+
228+
If users provide arbitrary `sending_queue` or Beat queue overrides, delivery semantics become **undefined** and **at least once delivery cannot be guaranteed**.
229+
These combinations are not tested and may result in event loss during backpressure or shutdown.
230+
231+
To achieve the intended delivery guarantee, the exporter that receives events from Beat receivers must define a `sending_queue` with the following characteristics:
232+
233+
- `enabled: true`: The queue must be active.
234+
- `wait_for_result: true`: The pipeline must wait for the exporter response before removing events.
235+
- `block_on_overflow: true`: Prevents event drops when the queue is full.
236+
- The `batch` configuration must include explicit `max_size`, `min_size`, and `flush_timeout` values to ensure events are grouped and flushed in predictable, controlled batches.
237+
238+
Additionally, the retry settings must be enabled on the exporter, using a backoff policy that retries until the operation succeeds. By default, `max_retries` is set to 3, which is how most Beats behave. Standalone Filebeat, however, retries indefinitely. Beats receivers don't support unlimited retries yet, and this is being tracked at https://github.com/elastic/beats/issues/47892.
239+
240+
Beat receivers also require the Beat-internal memory queue to run in synchronous mode for delivery guarantees. This is enabled by setting `queue.mem.flush.timeout: 0s` in each receiver configuration, as shown in the example above.

0 commit comments

Comments
 (0)