You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* docs(otel): document delivery guarantees for OTel mode (#11560)
When running in OTel mode, delivery guarantees for Beats receivers are
only possible with a specific combination of retry settings, the sending
queue, and Beats queue options, so document that.
(cherry picked from commit be1ae00)
# Conflicts:
# docs/hybrid-agent-beats-receivers.md
* fix conflicts
---------
Co-authored-by: Mauri de Souza Meneguzzo <[email protected]>
Copy file name to clipboardExpand all lines: docs/hybrid-agent-beats-receivers.md
+43Lines changed: 43 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -157,6 +157,7 @@ receivers:
157
157
type: filestream
158
158
output:
159
159
otelconsumer: {}
160
+
queue.mem.flush.timeout: 0s
160
161
metricbeatreceiver:
161
162
metricbeat:
162
163
modules:
@@ -168,6 +169,7 @@ receivers:
168
169
module: system
169
170
output:
170
171
otelconsumer: {}
172
+
queue.mem.flush.timeout: 0s
171
173
exporters:
172
174
elasticsearch/_agent-component/default:
173
175
api_key: placeholder
@@ -180,6 +182,24 @@ exporters:
180
182
enabled: true
181
183
mapping:
182
184
mode: bodymap
185
+
186
+
retry:
187
+
enabled: true
188
+
initial_interval: 1s
189
+
max_interval: 1m0s
190
+
max_retries: 3
191
+
192
+
sending_queue:
193
+
enabled: true
194
+
wait_for_result: true
195
+
block_on_overflow: true
196
+
num_consumers: 1
197
+
queue_size: 3200
198
+
batch:
199
+
max_size: 1600
200
+
min_size: 0
201
+
flush_timeout: 10s
202
+
sizer: items
183
203
service:
184
204
pipelines:
185
205
logs:
@@ -189,3 +209,26 @@ service:
189
209
- filebeatreceiver
190
210
- metricbeatreceiver
191
211
```
212
+
213
+
### Beats receivers delivery guarantees in OTel mode
214
+
215
+
When Beat receivers are used in OTel mode, event delivery guarantees depend on the configuration of the OpenTelemetry Collector `sending_queue` and retry settings.
216
+
Unlike standalone Beats, the EDOT pipeline allows users to customize queue behavior through the Collector configuration.
217
+
This flexibility is useful, but it also means that not every option combination is compatible with reliable delivery.
218
+
219
+
Elastic Agent in OTel mode provides an **at least once** delivery guarantee for Beat receivers **only when using the supported `sending_queue` settings described below**.
220
+
These settings mirror Beats pipeline behavior closely enough to preserve durability expectations.
221
+
222
+
If users provide arbitrary `sending_queue` or Beat queue overrides, delivery semantics become **undefined** and **at least once delivery cannot be guaranteed**.
223
+
These combinations are not tested and may result in event loss during backpressure or shutdown.
224
+
225
+
To achieve the intended delivery guarantee, the exporter that receives events from Beat receivers must define a `sending_queue` with the following characteristics:
226
+
227
+
- `enabled: true`: The queue must be active.
228
+
- `wait_for_result: true`: The pipeline must wait for the exporter response before removing events.
229
+
- `block_on_overflow: true`: Prevents event drops when the queue is full.
230
+
- The `batch` configuration must include explicit `max_size`, `min_size`, and `flush_timeout` values to ensure events are grouped and flushed in predictable, controlled batches.
231
+
232
+
Additionally, the retry settings must be enabled on the exporter, using a backoff policy that retries until the operation succeeds. By default, `max_retries` is set to 3, which is how most Beats behave. Standalone Filebeat, however, retries indefinitely. Beats receivers don't support unlimited retries yet, and this is being tracked at https://github.com/elastic/beats/issues/47892.
233
+
234
+
Beat receivers also require the Beat-internal memory queue to run in synchronous mode for delivery guarantees. This is enabled by setting `queue.mem.flush.timeout: 0s` in each receiver configuration, as shown in the example above.
0 commit comments