Skip to content

Adds resourceVersion support to k8sObject receiver#46543

Merged
dmitryax merged 26 commits intoopen-telemetry:mainfrom
dhruv-shah-sumo:add-resourceversion-docs
Apr 21, 2026
Merged

Adds resourceVersion support to k8sObject receiver#46543
dmitryax merged 26 commits intoopen-telemetry:mainfrom
dhruv-shah-sumo:add-resourceversion-docs

Conversation

@dhruv-shah-sumo
Copy link
Copy Markdown
Contributor

@dhruv-shah-sumo dhruv-shah-sumo commented Mar 2, 2026

Description

Implements optional resourceVersion checkpointing to prevent duplicate events on collector restart.
Resource version persistence is developed as part of watch mechanism which is being used by k8sObjects receiver and k8sEvents receiver. The feature is not enabled for pull mode in any of the receivers.

Features:

  • Opt-in via declaring storage extension for persistence
  • Namespace-aware checkpointing: Creates per stream checkpoint when separate namespaces are mentioned. Per namespace keys are maintained in the storage. The format is latestResourceVersion/configmaps.kube-system for per namespace watch stream. In case namespaces are not specified, a global clusterwide watch stream is created that creates a different checkpoint key with format: latestResourceVersion/pods, latestResourceVersion/nodes
  • Watch mode only (validated at config time)
  • If persistence is enabled, then resource_version provided in the config will be ignored. The persisted version will be used to kick-off the watch stream.
  • In case of stale persisted version, a List() API will be called to get the latest resource version available which is the existing way of handling stale resource version in k8s inventory.

Configuration:

  receivers:
    k8sobjects:
      storage: file_storage
      objects:
        - name: pods
          mode: watch

How it works:

  1. At the start of each watch stream, getResourceVersion() is called which checks if persistence is enabled then it retrieves the persisted version.
    • If persisted version is empty then it calls List() API to get the list of available objects and picks the latest version and persists it.
    • If persisted version is non-empty then it is supplied to kick-off the watch stream.
    • In case, the persisted version supplied to start watch stream in the step above is stale then checkpoint is deleted. This prompts the List() API call again to get the latest available resource version for the given stream.
  2. After processing each watch event, saves resourceVersion to storage.
  3. In case persistence operation fails for any reason, the watch stream continues.

Link to tracking issue

Fixes
#46017

Testing

  • Unit tests
  • Manually built the updated image using make docker-otelcontribcol and deployed in local. Tested various restarts and 410 handling scenarios for global watch streams, per namespace watch streams.

Documentation

  • Updated schema yaml files for k8sobject and k8sevents receiver.

Example - how the new config would look like?

extensions:
  file_storage:
    directory: /var/lib/otelcol/storage
    timeout: 10s

receivers:
  k8sobjects:
    auth_type: serviceAccount
    storage: file_storage
    include_initial_state: true
    objects:
      - name: pods
        mode: watch
        namespaces: [default]
      - name: events
        mode: watch
        namespaces: [default]

exporters:
  nop:

service:
  telemetry:
    logs:
      level: debug
  extensions: [file_storage]
  pipelines:
    logs:
      receivers: [k8sobjects]
      exporters: [nop]

@github-actions github-actions Bot added the first-time contributor PRs made by new contributors label Mar 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 2, 2026

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better!

@dhruv-shah-sumo dhruv-shah-sumo changed the title Adds resourceVersion support to k8sObject receiver [Draft: Work in progress] Adds resourceVersion support to k8sObject receiver Mar 2, 2026
@dhruv-shah-sumo dhruv-shah-sumo force-pushed the add-resourceversion-docs branch from 4e73dff to 01771ec Compare March 4, 2026 05:26
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Mar 9, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

Comment thread internal/k8sinventory/watch/checkpointer.go
Comment thread receiver/k8sobjectsreceiver/README.md Outdated
@dhruv-shah-sumo dhruv-shah-sumo force-pushed the add-resourceversion-docs branch from 0a9283c to fd52ea9 Compare March 11, 2026 10:22
@dhruv-shah-sumo dhruv-shah-sumo changed the title [Draft: Work in progress] Adds resourceVersion support to k8sObject receiver Adds resourceVersion support to k8sObject receiver Mar 12, 2026
Copy link
Copy Markdown

@jmmcorreia jmmcorreia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhruv-shah-sumo I did a first pass on the PR and had a few comments and questions. I will try to do a second pass some time this week to check if I missed something, but already left my initial comments so you can take a look.

Hoping this will make review easier for codeowners if they can take a look at the PR at a later point.

Comment thread internal/k8sinventory/watch/observer.go Outdated
Comment thread internal/k8sinventory/watch/observer.go Outdated
Comment thread internal/k8sinventory/watch/observer.go Outdated
Comment thread internal/k8sinventory/watch/observer.go
Comment thread internal/k8sinventory/watch/observer.go Outdated
Comment thread internal/k8sinventory/watch/observer.go
Comment thread receiver/k8seventsreceiver/README.md Outdated
@dhruv-shah-sumo
Copy link
Copy Markdown
Contributor Author

@dhruv-shah-sumo I did a first pass on the PR and had a few comments and questions. I will try to do a second pass some time this week to check if I missed something, but already left my initial comments so you can take a look.

Hoping this will make review easier for codeowners if they can take a look at the PR at a later point.

Thanks a lot for comprehensive review comments @jmmcorreia . I'll address them as soon as possible.

@dhruv-shah-sumo dhruv-shah-sumo force-pushed the add-resourceversion-docs branch 2 times, most recently from 2dbbe04 to 9fa79b6 Compare March 29, 2026 14:04
Copy link
Copy Markdown

@jmmcorreia jmmcorreia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few extra comments. Code wise, other than the remaining comments, I would say it seems to be good.

Next pass I will check the UT in a bit more detail

Comment thread internal/k8sinventory/watch/observer.go Outdated
Comment thread internal/k8sinventory/watch/observer.go Outdated
Comment thread internal/k8sinventory/watch/observer.go Outdated
Comment thread internal/k8sinventory/watch/observer.go
Comment thread receiver/k8seventsreceiver/README.md Outdated
Comment thread receiver/k8sobjectsreceiver/config.go Outdated
@dhruv-shah-sumo dhruv-shah-sumo force-pushed the add-resourceversion-docs branch from 341a86b to c34c278 Compare March 31, 2026 11:15
…te deduplication

- checkpointer.Flush: continue processing remaining keys on individual
  write failures instead of aborting early; log per-key errors and
  return a single aggregated error to avoid silent data loss
- checkpointer.SetCheckpoint: apply high-watermark semantics — only
  update the in-memory pending value when the new resourceVersion is
  numerically greater than the existing one, guarding against
  out-of-order resourceVersions from List() responses
- observer.sendInitialState: separate strconv.ParseInt failure from
  the objRV <= persistedRV comparison; emit the event on parse failure
  with a Warn log to avoid silent data loss (duplicate preferred over
  missed event)
- observer.getResourceVersion: split SetCheckpoint/Flush into two
  independent if-blocks (was else-if) for cleaner error handling
- observer_test: add TestSendInitialStateUnparsableRVEmitsEvent to
  verify events are emitted when object resourceVersion cannot be parsed

Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
…fig validation

Removed TestObserverResourceVersionPriority and test cases in
TestGetResourceVersion that combined resource_version with
persist_resource_version, which is now rejected by config validation.

Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-Authored-By: Dhruv Shah <dhruv.shah@sumologic.com>
Co-Authored-By: Dhruv Shah <dhruv.shah@sumologic.com>
…sist when storage is set

Remove the top-level persist_resource_version flag. When a storage
extension is configured, the receiver now automatically persists the
resourceVersion for all watch-mode objects. This simplifies the config
and makes persistence the obvious default when storage is available.

Signed-off-by: Dhruv Shah <dhruv.shah@sumologic.com>
…aim for storage

The storage volume in the Deployment example was using emptyDir which is
ephemeral and lost on pod restart, defeating the purpose of resource
version persistence. Replace it with a PersistentVolumeClaim.

Signed-off-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
@dhruv-shah-sumo dhruv-shah-sumo force-pushed the add-resourceversion-docs branch from 8a57376 to a15da3a Compare April 13, 2026 02:54
…endency

Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
Co-authored-by: Dhruv Shah <dhruv.shah@sumologic.com>
Copy link
Copy Markdown
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thank's @dhruv-shah-sumo!

** Needs a rebase :)

@dhruv-shah-sumo dhruv-shah-sumo force-pushed the add-resourceversion-docs branch from 9b3d66a to 4c19e4b Compare April 20, 2026 09:17
@dhruv-shah-sumo dhruv-shah-sumo force-pushed the add-resourceversion-docs branch from e8e9576 to 573b40c Compare April 20, 2026 11:32
@ChrsMark
Copy link
Copy Markdown
Member

@dhruv-shah-sumo main is broken. Mind taking 573b40c to a standalone PR to fix it first?

@dhruv-shah-sumo
Copy link
Copy Markdown
Contributor Author

@dhruv-shah-sumo main is broken. Mind taking 573b40c to a standalone PR to fix it first?

#47759 Have tagged you on the PR.

@ChrsMark ChrsMark requested a review from dmitryax April 20, 2026 13:17
@dmitryax dmitryax merged commit b23b74a into open-telemetry:main Apr 21, 2026
191 checks passed
@otelbot
Copy link
Copy Markdown
Contributor

otelbot Bot commented Apr 21, 2026

Thank you for your contribution @dhruv-shah-sumo! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants