Commit 77b7cda
kafka_consumer: bound and refine estimated_consumer_lag (#24167)
* kafka_consumer: bound and refine estimated_consumer_lag
Cap left-extrapolation of the broker timestamp cache so a consumer offset
older than the oldest cached sample cannot extrapolate more than 10 minutes
past it, keeping estimated_consumer_lag bounded.
Use max(consumer_offset, low_watermark) as the offset basis for lag-in-time
when cluster monitoring is enabled: messages below the low watermark are out
of retention and unreachable, so they should not inflate the time lag.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: add changelog entry for PR #24167
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: compact and prune the broker-timestamp cache
Replace single-oldest eviction with batch compaction (Visvalingam-Whyatt)
triggered when the cache reaches capacity: keep the oldest and newest samples
and drop the points that least distort the offset/timestamp curve, so the
cache spans a longer history at a coarsening resolution and high lag is
interpolated rather than extrapolated.
At the same trigger, prune samples below the earliest consumer offset (keeping
one anchor) since no consumer will ever interpolate there.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: prune broker-timestamp cache by low watermark
Use the partition low watermark as the prune floor when cluster monitoring is
enabled (the physically meaningful "lowest readable offset"), falling back to
the earliest committed consumer offset otherwise. The low watermark is now
fetched before the cache update and reused for both pruning and the lag-in-time
floor, so there is no extra broker call.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: fetch low watermark offsets once and share them
Previously the log-start (low watermark) offsets were fetched twice per run
when cluster monitoring and data streams were both enabled: once by the
metadata collector for partition.size/topic.size/throughput, and again by the
lag path for the lag-in-time and cache-pruning floor.
Fetch them once in check(), gated on cluster monitoring, over all non-internal
topic partitions, and share the result with both the data-streams lag path and
the metadata collector. Removes the duplicate list_offsets(earliest) call and
the divergent internal-topic handling.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: reuse _fetch_earliest_offsets instead of a parallel fetch
Drop the PR-added Client.get_low_watermark_offsets and the
_get_low_watermark_offsets wrapper, which duplicated the existing
ClusterMetadataCollector._fetch_earliest_offsets. The check now calls
_fetch_earliest_offsets once under cluster monitoring and shares the result
with both the data-streams lag/pruning path and the topic-metadata collection,
so the earliest offsets are still fetched only once per run.
This reverts client.py to master and keeps the cluster_metadata.py change to a
small signature tweak.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: use low_watermark_offsets directly in topic metadata
Drop the redundant earliest_offsets alias and reference the passed-in
low_watermark_offsets directly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: address review feedback on lag bounding
- Clarify that the left-extrapolation cap bounds lag-in-time regardless of
cluster monitoring or the low-watermark floor, and document why there is no
symmetric right-side clamp (the newest cached sample is the just-collected
highwater, which the consumer offset can never exceed).
- Promote ClusterMetadataCollector.fetch_earliest_offsets to a public method
since KafkaCheck now calls it across the class boundary.
- Log a debug line when the cache-prune floor falls back from the low watermark
to the earliest consumer offset.
- Extract the Visvalingam-Whyatt significance closure into a module-level
_interpolation_error helper.
- Parameterize the _visvalingam_whyatt tests; add direct tests for
_earliest_consumer_offsets, _prune_below_anchor, and the left-extrapolation
cap through report_consumer_offsets_and_lag without a low watermark.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: trim comments to a single note on the extrapolation cap
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: move extrapolation-cap comment to the clamp line
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: reuse fetched topic partitions in topic metadata collection
Pass the topic-partition map computed in check() through collect_all_metadata
into _collect_topic_metadata instead of fetching it again, so the cluster
monitoring path makes the same number of get_topic_partitions calls as before.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: satisfy ruff formatting for collect_all_metadata call
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* kafka_consumer: clear full timestamp cache on reset, test pruning end-to-end
When a reset is detected (any cached offset above the new highwater), clear
the entire cache instead of only dropping entries above the highwater. The
VW compactor always preserves the minimum cached offset as an endpoint, so
old-generation low-offset entries would never age out and would poison lag
interpolation indefinitely after a partial reset.
Also replaces the direct private-method test for consumer-floor pruning
with a dd_run_check test that exercises the full check() path, and adds
tests for the new clear-on-reset behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* kafka_consumer: satisfy ruff formatting for new unit tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* kafka_consumer: shorten reset-detection comment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* kafka_consumer: trim reset test comment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* kafka_consumer: test timestamp compaction via dd_run_check instead of private method
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* kafka_consumer: replace _prune_below_anchor direct tests with dd_run_check tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* kafka_consumer: satisfy ruff formatting for prune_below_anchor replacement tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* kafka_consumer: replace private method tests with public method tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* kafka_consumer: test that lag accuracy is preserved after VW compaction
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* kafka_consumer: parametrize VW compaction test with 4 cases
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 8c05418 commit 77b7cda
4 files changed
Lines changed: 425 additions & 57 deletions
File tree
- kafka_consumer
- changelog.d
- datadog_checks/kafka_consumer
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
Lines changed: 5 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
204 | 204 | | |
205 | 205 | | |
206 | 206 | | |
207 | | - | |
| 207 | + | |
208 | 208 | | |
209 | 209 | | |
210 | 210 | | |
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
220 | | - | |
| 220 | + | |
221 | 221 | | |
222 | 222 | | |
223 | 223 | | |
| |||
386 | 386 | | |
387 | 387 | | |
388 | 388 | | |
389 | | - | |
| 389 | + | |
390 | 390 | | |
391 | 391 | | |
392 | 392 | | |
| |||
441 | 441 | | |
442 | 442 | | |
443 | 443 | | |
444 | | - | |
| 444 | + | |
445 | 445 | | |
446 | 446 | | |
447 | | - | |
448 | | - | |
449 | 447 | | |
450 | 448 | | |
451 | 449 | | |
| |||
455 | 453 | | |
456 | 454 | | |
457 | 455 | | |
458 | | - | |
459 | | - | |
460 | 456 | | |
461 | 457 | | |
462 | 458 | | |
| |||
496 | 492 | | |
497 | 493 | | |
498 | 494 | | |
499 | | - | |
| 495 | + | |
500 | 496 | | |
501 | 497 | | |
502 | 498 | | |
| |||
Lines changed: 101 additions & 17 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| |||
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| 22 | + | |
| 23 | + | |
21 | 24 | | |
22 | 25 | | |
23 | 26 | | |
| |||
67 | 70 | | |
68 | 71 | | |
69 | 72 | | |
| 73 | + | |
| 74 | + | |
70 | 75 | | |
71 | 76 | | |
72 | 77 | | |
| |||
86 | 91 | | |
87 | 92 | | |
88 | 93 | | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
89 | 97 | | |
90 | 98 | | |
91 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
92 | 105 | | |
93 | 106 | | |
94 | 107 | | |
| |||
129 | 142 | | |
130 | 143 | | |
131 | 144 | | |
| 145 | + | |
132 | 146 | | |
133 | 147 | | |
134 | 148 | | |
| |||
137 | 151 | | |
138 | 152 | | |
139 | 153 | | |
140 | | - | |
| 154 | + | |
141 | 155 | | |
142 | 156 | | |
143 | 157 | | |
| |||
274 | 288 | | |
275 | 289 | | |
276 | 290 | | |
277 | | - | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
278 | 302 | | |
279 | 303 | | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
287 | 308 | | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | | - | |
292 | | - | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
293 | 314 | | |
294 | 315 | | |
295 | 316 | | |
| |||
312 | 333 | | |
313 | 334 | | |
314 | 335 | | |
315 | | - | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
316 | 343 | | |
317 | 344 | | |
| 345 | + | |
318 | 346 | | |
319 | 347 | | |
320 | 348 | | |
| |||
388 | 416 | | |
389 | 417 | | |
390 | 418 | | |
391 | | - | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
392 | 422 | | |
393 | 423 | | |
394 | 424 | | |
| |||
502 | 532 | | |
503 | 533 | | |
504 | 534 | | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
505 | 539 | | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
0 commit comments