Skip to content

Observability: add metrics and traces for git fetch and sops decrypt #63

@patrick-hermann-sva

Description

@patrick-hermann-sva

Problem

cloneAndReadFile() (internal/controller/remotecluster/remotecluster.go:417-446) is the hot path — every reconcile goes through it — but it emits no metrics or spans. Git fetch latency and decrypt latency are the first things operators want to see during incidents.

Suggested fix

Add Prometheus metrics:

  • provider_kubeconfig_git_fetch_duration_seconds{repo,branch,result} (histogram)
  • provider_kubeconfig_git_cache_hit_total{repo,branch} (counter) — distinguish clone vs pull
  • provider_kubeconfig_sops_decrypt_duration_seconds{format,result} (histogram)
  • provider_kubeconfig_reconcile_errors_total{stage} (counter) — where stagegit|decrypt|secret|downstream

Add OTel spans around EnsureCloned, ReadFile, and SOPSDecrypt so traces show which phase dominates.

Also verify error wrapping carries repo URL + file path for every error returned from this path.

Files

  • internal/controller/remotecluster/remotecluster.go:417-446
  • internal/git/git.go:79, 143-150
  • internal/decrypt/decrypt.go:30

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions