You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(otel): document delivery guarantees for OTel mode (#11560)
When running in OTel mode, delivery guarantees for Beats receivers are
only possible with a specific combination of retry settings, the sending
queue, and Beats queue options, so document that.
(cherry picked from commit be1ae00)
# Conflicts:
# docs/hybrid-agent-beats-receivers.md
Copy file name to clipboardExpand all lines: docs/hybrid-agent-beats-receivers.md
+49Lines changed: 49 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -155,8 +155,12 @@ receivers:
155
155
paths:
156
156
- /var/log/*.log
157
157
type: filestream
158
+
<<<<<<< HEAD
158
159
output:
159
160
otelconsumer: {}
161
+
=======
162
+
queue.mem.flush.timeout: 0s
163
+
>>>>>>> be1ae0059 (docs(otel): document delivery guarantees for OTel mode (#11560))
160
164
metricbeatreceiver:
161
165
metricbeat:
162
166
modules:
@@ -166,8 +170,12 @@ receivers:
166
170
metricsets:
167
171
- cpu
168
172
module: system
173
+
<<<<<<< HEAD
169
174
output:
170
175
otelconsumer: {}
176
+
=======
177
+
queue.mem.flush.timeout: 0s
178
+
>>>>>>> be1ae0059 (docs(otel): document delivery guarantees for OTel mode (#11560))
171
179
exporters:
172
180
elasticsearch/_agent-component/default:
173
181
api_key: placeholder
@@ -180,6 +188,24 @@ exporters:
180
188
enabled: true
181
189
mapping:
182
190
mode: bodymap
191
+
192
+
retry:
193
+
enabled: true
194
+
initial_interval: 1s
195
+
max_interval: 1m0s
196
+
max_retries: 3
197
+
198
+
sending_queue:
199
+
enabled: true
200
+
wait_for_result: true
201
+
block_on_overflow: true
202
+
num_consumers: 1
203
+
queue_size: 3200
204
+
batch:
205
+
max_size: 1600
206
+
min_size: 0
207
+
flush_timeout: 10s
208
+
sizer: items
183
209
service:
184
210
pipelines:
185
211
logs:
@@ -189,3 +215,26 @@ service:
189
215
- filebeatreceiver
190
216
- metricbeatreceiver
191
217
```
218
+
219
+
### Beats receivers delivery guarantees in OTel mode
220
+
221
+
When Beat receivers are used in OTel mode, event delivery guarantees depend on the configuration of the OpenTelemetry Collector `sending_queue` and retry settings.
222
+
Unlike standalone Beats, the EDOT pipeline allows users to customize queue behavior through the Collector configuration.
223
+
This flexibility is useful, but it also means that not every option combination is compatible with reliable delivery.
224
+
225
+
Elastic Agent in OTel mode provides an **at least once** delivery guarantee for Beat receivers **only when using the supported `sending_queue` settings described below**.
226
+
These settings mirror Beats pipeline behavior closely enough to preserve durability expectations.
227
+
228
+
If users provide arbitrary `sending_queue` or Beat queue overrides, delivery semantics become **undefined** and **at least once delivery cannot be guaranteed**.
229
+
These combinations are not tested and may result in event loss during backpressure or shutdown.
230
+
231
+
To achieve the intended delivery guarantee, the exporter that receives events from Beat receivers must define a `sending_queue` with the following characteristics:
232
+
233
+
- `enabled: true`: The queue must be active.
234
+
- `wait_for_result: true`: The pipeline must wait for the exporter response before removing events.
235
+
- `block_on_overflow: true`: Prevents event drops when the queue is full.
236
+
- The `batch` configuration must include explicit `max_size`, `min_size`, and `flush_timeout` values to ensure events are grouped and flushed in predictable, controlled batches.
237
+
238
+
Additionally, the retry settings must be enabled on the exporter, using a backoff policy that retries until the operation succeeds. By default, `max_retries` is set to 3, which is how most Beats behave. Standalone Filebeat, however, retries indefinitely. Beats receivers don't support unlimited retries yet, and this is being tracked at https://github.com/elastic/beats/issues/47892.
239
+
240
+
Beat receivers also require the Beat-internal memory queue to run in synchronous mode for delivery guarantees. This is enabled by setting `queue.mem.flush.timeout: 0s` in each receiver configuration, as shown in the example above.
0 commit comments