Skip to content

feat(nr-k8s-otel-collector): add initContainer to fix ATP storage per…#2199

Open
gmanandhar-nr wants to merge 2 commits intomasterfrom
gaurab/atp-state-persistence
Open

feat(nr-k8s-otel-collector): add initContainer to fix ATP storage per…#2199
gmanandhar-nr wants to merge 2 commits intomasterfrom
gaurab/atp-state-persistence

Conversation

@gmanandhar-nr
Copy link
Copy Markdown
Member

Problem

When ATP (Adaptive Telemetry Processor) is enabled, the collector fails to persist state with the following
error:

Error: open /var/lib/nrdot-collector/adaptiveprocess.db: permission denied

Root Cause: The hostPath volume at /var/lib/nrdot-collector is created by Kubernetes with root:root
ownership, but the main container runs as user 1001 (non-root), preventing ATP from writing the persistence
database.

Solution

Added a new fix-atp-storage-permissions initContainer that:

  • Runs only when enable_atp: true (conditional rendering)
  • Changes ownership of /var/lib/nrdot-collector to 1001:1001 before the main container starts
  • Reuses the existing bitnami/kubectl image (no new dependencies)

Security

The initContainer runs with minimal privileges:

  • runAsUser: 0 (required for chown)
  • allowPrivilegeEscalation: false (prevents privilege escalation attacks)
  • capabilities: drop: [ALL], add: [CHOWN, DAC_OVERRIDE] (only necessary capabilities)

Backward Compatibility

No impact on existing installations:

  • Wrapped in {{- if .Values.enable_atp }} conditional
  • Only adds initContainer when ATP is explicitly enabled
  • Uses existing image (no new dependencies)
  • All existing unit tests pass (213/213)

@gmanandhar-nr gmanandhar-nr requested a review from a team as a code owner April 1, 2026 12:52
Copy link
Copy Markdown
Contributor

@Philip-R-Beckwith Philip-R-Beckwith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly heavy handed.
Why are we not using K8s built in fsGroup to fix the permissions issue?

@gmanandhar-nr
Copy link
Copy Markdown
Member Author

Using fsGroup would be cleaner. However, fsGroup doesn't work with hostPath volumes, which is what we're using here.

The reason we use hostPath is because

  • This is a DaemonSet - one pod per node
  • ATP needs per-node persistent storage to track process metrics on each node
  • hostPath ensures data survives pod restarts but stays node-specific

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants