Adds resourceVersion support to k8sObject receiver#46543
Adds resourceVersion support to k8sObject receiver#46543dmitryax merged 26 commits intoopen-telemetry:mainfrom
Conversation
|
Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib. Important reminders:
A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better! |
4e73dff to
01771ec
Compare
|
|
dc08102 to
e235588
Compare
0a9283c to
fd52ea9
Compare
jmmcorreia
left a comment
There was a problem hiding this comment.
@dhruv-shah-sumo I did a first pass on the PR and had a few comments and questions. I will try to do a second pass some time this week to check if I missed something, but already left my initial comments so you can take a look.
Hoping this will make review easier for codeowners if they can take a look at the PR at a later point.
Thanks a lot for comprehensive review comments @jmmcorreia . I'll address them as soon as possible. |
2dbbe04 to
9fa79b6
Compare
jmmcorreia
left a comment
There was a problem hiding this comment.
Left a few extra comments. Code wise, other than the remaining comments, I would say it seems to be good.
Next pass I will check the UT in a bit more detail
341a86b to
c34c278
Compare
…te deduplication - checkpointer.Flush: continue processing remaining keys on individual write failures instead of aborting early; log per-key errors and return a single aggregated error to avoid silent data loss - checkpointer.SetCheckpoint: apply high-watermark semantics — only update the in-memory pending value when the new resourceVersion is numerically greater than the existing one, guarding against out-of-order resourceVersions from List() responses - observer.sendInitialState: separate strconv.ParseInt failure from the objRV <= persistedRV comparison; emit the event on parse failure with a Warn log to avoid silent data loss (duplicate preferred over missed event) - observer.getResourceVersion: split SetCheckpoint/Flush into two independent if-blocks (was else-if) for cleaner error handling - observer_test: add TestSendInitialStateUnparsableRVEmitsEvent to verify events are emitted when object resourceVersion cannot be parsed Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
…fig validation Removed TestObserverResourceVersionPriority and test cases in TestGetResourceVersion that combined resource_version with persist_resource_version, which is now rejected by config validation. Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-Authored-By: Dhruv Shah <dhruv.shah@sumologic.com>
Co-Authored-By: Dhruv Shah <dhruv.shah@sumologic.com>
…sist when storage is set Remove the top-level persist_resource_version flag. When a storage extension is configured, the receiver now automatically persists the resourceVersion for all watch-mode objects. This simplifies the config and makes persistence the obvious default when storage is available. Signed-off-by: Dhruv Shah <dhruv.shah@sumologic.com>
…aim for storage The storage volume in the Deployment example was using emptyDir which is ephemeral and lost on pod restart, defeating the purpose of resource version persistence. Replace it with a PersistentVolumeClaim. Signed-off-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
8a57376 to
a15da3a
Compare
…endency Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
9b3d66a to
4c19e4b
Compare
e8e9576 to
573b40c
Compare
|
@dhruv-shah-sumo |
#47759 Have tagged you on the PR. |
|
Thank you for your contribution @dhruv-shah-sumo! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help. |
Description
Implements optional resourceVersion checkpointing to prevent duplicate events on collector restart.
Resource version persistence is developed as part of watch mechanism which is being used by k8sObjects receiver and k8sEvents receiver. The feature is not enabled for
pullmode in any of the receivers.Features:
latestResourceVersion/configmaps.kube-systemfor per namespace watch stream. In case namespaces are not specified, a global clusterwide watch stream is created that creates a different checkpoint key with format:latestResourceVersion/pods, latestResourceVersion/nodesresource_versionprovided in the config will be ignored. The persisted version will be used to kick-off the watch stream.Configuration:
How it works:
getResourceVersion()is called which checks if persistence is enabled then it retrieves the persisted version.Link to tracking issue
Fixes
#46017
Testing
make docker-otelcontribcoland deployed in local. Tested various restarts and 410 handling scenarios for global watch streams, per namespace watch streams.Documentation
Example - how the new config would look like?