Problem
cloneAndReadFile() (internal/controller/remotecluster/remotecluster.go:417-446) is the hot path — every reconcile goes through it — but it emits no metrics or spans. Git fetch latency and decrypt latency are the first things operators want to see during incidents.
Suggested fix
Add Prometheus metrics:
provider_kubeconfig_git_fetch_duration_seconds{repo,branch,result} (histogram)
provider_kubeconfig_git_cache_hit_total{repo,branch} (counter) — distinguish clone vs pull
provider_kubeconfig_sops_decrypt_duration_seconds{format,result} (histogram)
provider_kubeconfig_reconcile_errors_total{stage} (counter) — where stage ∈ git|decrypt|secret|downstream
Add OTel spans around EnsureCloned, ReadFile, and SOPSDecrypt so traces show which phase dominates.
Also verify error wrapping carries repo URL + file path for every error returned from this path.
Files
internal/controller/remotecluster/remotecluster.go:417-446
internal/git/git.go:79, 143-150
internal/decrypt/decrypt.go:30
Problem
cloneAndReadFile()(internal/controller/remotecluster/remotecluster.go:417-446) is the hot path — every reconcile goes through it — but it emits no metrics or spans. Git fetch latency and decrypt latency are the first things operators want to see during incidents.Suggested fix
Add Prometheus metrics:
provider_kubeconfig_git_fetch_duration_seconds{repo,branch,result}(histogram)provider_kubeconfig_git_cache_hit_total{repo,branch}(counter) — distinguish clone vs pullprovider_kubeconfig_sops_decrypt_duration_seconds{format,result}(histogram)provider_kubeconfig_reconcile_errors_total{stage}(counter) — wherestage∈git|decrypt|secret|downstreamAdd OTel spans around
EnsureCloned,ReadFile, andSOPSDecryptso traces show which phase dominates.Also verify error wrapping carries repo URL + file path for every error returned from this path.
Files
internal/controller/remotecluster/remotecluster.go:417-446internal/git/git.go:79, 143-150internal/decrypt/decrypt.go:30