Skip to content

Conversation

@vimalk78
Copy link
Collaborator

Reduces API server load on large clusters by polling local kubelet /pods endpoint instead of maintaining persistent watch connections.

  • Add kubeletPodInformer that polls kubelet at NODE_IP:10250/pods
  • Use downward API (status.hostIP) to get node IP
  • Add nodes/proxy RBAC for kubelet webhook authorization
  • Keep apiserver mode as fallback via kube.podInformer.mode config

  Reduces API server load on large clusters by polling local kubelet
  /pods endpoint instead of maintaining persistent watch connections.

  - Add kubeletPodInformer that polls kubelet at NODE_IP:10250/pods
  - Use downward API (status.hostIP) to get node IP
  - Add nodes/proxy RBAC for kubelet webhook authorization
  - Keep apiserver mode as fallback via kube.podInformer.mode config

Signed-off-by: Vimal Kumar <[email protected]>
@github-actions github-actions bot added the feat A new feature or enhancement label Dec 12, 2025
@github-actions
Copy link
Contributor

⚠️ Config changes detected in this PR
Please make sure that the config changes are updated in the following places as part of this PR:

  • docs/user/configuration/configuration.md
  • compose/dev/kepler-dev/etc/kepler/config.yaml
  • compose/default/kepler/etc/kepler/config.yaml
  • hack/config.yaml
  • manifests/helm/kepler/values.yaml

@github-actions
Copy link
Contributor

📊 Profiling reports are ready to be viewed

⚠️ Variability in pprof CPU and Memory profiles
When comparing pprof profiles of Kepler versions, expect variability in CPU and memory. Focus only on significant, consistent differences.

💻 CPU Comparison with base Kepler
File: kepler
Build ID: 271a22a53b29f696ce2ffe30cfc66069f86e14c4
Type: cpu
Time: 2025-12-12 13:53:07 UTC
Duration: 120s, Total samples = 4.03s ( 3.36%)
Active filters:
   show=github.com/sustainable-computing-io
Showing nodes accounting for 0.05s, 1.24% of 4.03s total
Dropped 1 node (cum <= 0.02s)
      flat  flat%   sum%        cum   cum%
     0.06s  1.49%  1.49%      0.06s  1.49%  github.com/sustainable-computing-io/kepler/internal/resource.(*procWrapper).CPUTime
         0     0%  1.49%      0.06s  1.49%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).updateProcessCache
         0     0%  1.49%      0.06s  1.49%  github.com/sustainable-computing-io/kepler/internal/resource.populateProcessFields
    -0.05s  1.24%  0.25%     -0.05s  1.24%  github.com/sustainable-computing-io/kepler/internal/resource.(*procFSReader).AllProcs
     0.05s  1.24%  1.49%      0.05s  1.24%  github.com/sustainable-computing-io/kepler/internal/resource.(*procFSReader).CPUUsageRatio
         0     0%  1.49%      0.05s  1.24%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).Refresh.func3
         0     0%  1.49%      0.05s  1.24%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).refreshNode
         0     0%  1.49%     -0.03s  0.74%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).calculatePower
         0     0%  1.49%     -0.03s  0.74%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).refreshSnapshot
         0     0%  1.49%     -0.03s  0.74%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).synchronizedPowerRefresh
         0     0%  1.49%     -0.03s  0.74%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).synchronizedPowerRefresh.func1
         0     0%  1.49%     -0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/device.(*AggregatedZone).Energy
    -0.02s   0.5%  0.99%     -0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/device.sysfsRaplZone.Energy
         0     0%  0.99%      0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*PlatformCollector).Collect
     0.02s   0.5%  1.49%      0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*PowerCollector).collectProcessMetrics
         0     0%  1.49%     -0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).Snapshot
         0     0%  1.49%     -0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).calculateNodePower
         0     0%  1.49%     -0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).scheduleNextCollection.func1
         0     0%  1.49%      0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/platform/redfish.(*PowerReader).ReadAll
     0.02s   0.5%  1.99%      0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/platform/redfish.(*PowerReader).readPowerSubsystem
         0     0%  1.99%      0.02s   0.5%  github.com/sustainable-computing-io/kepler/internal/platform/redfish.(*Service).Power
    -0.01s  0.25%  1.74%     -0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/device.Energy.String
     0.01s  0.25%  1.99%      0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*PowerCollector).collectPodMetrics
    -0.01s  0.25%  1.74%     -0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*PowerCollector).collectVMMetrics
         0     0%  1.74%     -0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*cpuInfoCollector).Collect
    -0.01s  0.25%  1.49%     -0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*realProcFS).CPUInfo
         0     0%  1.49%     -0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).calculateProcessPower
         0     0%  1.49%     -0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).ensureFreshData
    -0.01s  0.25%  1.24%     -0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/monitor.(*Process).Clone (inline)
         0     0%  1.24%     -0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/monitor.(*Snapshot).Clone
         0     0%  1.24%     -0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/monitor.(*TerminatedResourceTracker[go.shape.*uint8]).Add
         0     0%  1.24%      0.01s  0.25%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).refreshProcesses
💾 Memory Comparison with base Kepler (Inuse)
File: kepler
Build ID: 271a22a53b29f696ce2ffe30cfc66069f86e14c4
Type: inuse_space
Time: 2025-12-12 13:55:07 UTC
Duration: 120.02s, Total samples = 8025.81kB 
Active filters:
   show=github.com/sustainable-computing-io
Showing nodes accounting for 1044.80kB, 13.02% of 8025.81kB total
      flat  flat%   sum%        cum   cum%
         0     0%     0%  1540.83kB 19.20%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*PowerCollector).Collect
         0     0%     0%  1540.83kB 19.20%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).Snapshot
         0     0%     0%  1540.83kB 19.20%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).ensureFreshData
         0     0%     0% -1024.20kB 12.76%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).scheduleNextCollection.func1
         0     0%     0%   528.17kB  6.58%  github.com/sustainable-computing-io/kepler/internal/resource.computeTypeInfoFromProc.func1
  528.17kB  6.58%  6.58%   528.17kB  6.58%  github.com/sustainable-computing-io/kepler/internal/resource.containerInfoFromCgroupPaths
         0     0%  6.58%   528.17kB  6.58%  github.com/sustainable-computing-io/kepler/internal/resource.containerInfoFromProc
         0     0%  6.58%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).calculatePower
         0     0%  6.58%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).refreshSnapshot
         0     0%  6.58%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).synchronizedPowerRefresh
         0     0%  6.58%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).synchronizedPowerRefresh.func1
  516.64kB  6.44% 13.02%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/resource.(*procWrapper).CPUTime
         0     0% 13.02%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).Refresh
         0     0% 13.02%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).refreshProcesses
         0     0% 13.02%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).updateProcessCache
         0     0% 13.02%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/resource.newProcess
         0     0% 13.02%   516.64kB  6.44%  github.com/sustainable-computing-io/kepler/internal/resource.populateProcessFields
💾 Memory Comparison with base Kepler (Alloc)
File: kepler
Build ID: 271a22a53b29f696ce2ffe30cfc66069f86e14c4
Type: alloc_space
Time: 2025-12-12 13:55:07 UTC
Duration: 120.02s, Total samples = 174.89MB 
Active filters:
   show=github.com/sustainable-computing-io
Showing nodes accounting for 6.79MB, 3.88% of 174.89MB total
Dropped 1 node (cum <= 0.87MB)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     9.07MB  5.19%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).Refresh
         0     0%     0%     9.07MB  5.19%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).refreshProcesses
         0     0%     0%     7.07MB  4.04%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).calculatePower
         0     0%     0%     7.07MB  4.04%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).refreshSnapshot
         0     0%     0%     7.07MB  4.04%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).synchronizedPowerRefresh
         0     0%     0%     7.07MB  4.04%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).synchronizedPowerRefresh.func1
    6.50MB  3.72%  3.72%     6.50MB  3.72%  github.com/sustainable-computing-io/kepler/internal/resource.(*procWrapper).CPUTime
         0     0%  3.72%     6.50MB  3.72%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).updateProcessCache
         0     0%  3.72%     6.50MB  3.72%  github.com/sustainable-computing-io/kepler/internal/resource.populateProcessFields
         0     0%  3.72%     5.50MB  3.15%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).Snapshot
   -5.50MB  3.15%  0.57%    -5.50MB  3.15%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*PowerCollector).collectProcessMetrics
         0     0%  0.57%     4.51MB  2.58%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).ensureFreshData
    2.57MB  1.47%  2.04%     2.57MB  1.47%  github.com/sustainable-computing-io/kepler/internal/resource.(*procFSReader).AllProcs
         0     0%  2.04%     2.56MB  1.46%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).scheduleNextCollection.func1
       2MB  1.14%  3.19%        2MB  1.14%  maps.Copy[go.shape.map[github.com/sustainable-computing-io/kepler/internal/device.EnergyZone]github.com/sustainable-computing-io/kepler/internal/monitor.Usage,go.shape.map[github.com/sustainable-computing-io/kepler/internal/device.EnergyZone]github.com/sustainable-computing-io/kepler/internal/monitor.Usage,go.shape.interface { Energy ; Index int; MaxEnergy github.com/sustainable-computing-io/kepler/internal/device.Energy; Name string; Path string },go.shape.struct { EnergyTotal github.com/sustainable-computing-io/kepler/internal/device.Energy; Power github.com/sustainable-computing-io/kepler/internal/device.Power }] (inline)
   -0.50MB  0.29%  2.90%     1.50MB  0.86%  github.com/sustainable-computing-io/kepler/internal/monitor.(*Process).Clone (inline)
         0     0%  2.90%    -1.50MB  0.86%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).calculateProcessPower
   -1.50MB  0.86%  2.04%    -1.50MB  0.86%  github.com/sustainable-computing-io/kepler/internal/monitor.newProcess (inline)
         0     0%  2.04%     1.21MB  0.69%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*cpuInfoCollector).Collect
    1.21MB  0.69%  2.73%     1.21MB  0.69%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*realProcFS).CPUInfo
         0     0%  2.73%     1.02MB  0.58%  github.com/sustainable-computing-io/kepler/internal/resource.computeTypeInfoFromProc.func1
         0     0%  2.73%     1.02MB  0.58%  github.com/sustainable-computing-io/kepler/internal/resource.containerInfoFromProc
    0.99MB  0.57%  3.30%     0.99MB  0.57%  github.com/sustainable-computing-io/kepler/internal/resource.(*procFSReader).CPUUsageRatio
         0     0%  3.30%     0.99MB  0.57%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).Refresh.func3
         0     0%  3.30%     0.99MB  0.57%  github.com/sustainable-computing-io/kepler/internal/resource.(*resourceInformer).refreshNode
   -0.51MB  0.29%  3.00%     0.99MB  0.56%  github.com/sustainable-computing-io/kepler/internal/monitor.(*Snapshot).Clone
         0     0%  3.00%     0.52MB  0.29%  github.com/sustainable-computing-io/kepler/internal/resource.computeTypeInfoFromProc.func2
    0.52MB  0.29%  3.30%     0.52MB  0.29%  github.com/sustainable-computing-io/kepler/internal/resource.containerInfoFromCgroupPaths
    0.52MB  0.29%  3.59%     0.52MB  0.29%  github.com/sustainable-computing-io/kepler/internal/resource.vmInfoFromCmdLine
         0     0%  3.59%     0.52MB  0.29%  github.com/sustainable-computing-io/kepler/internal/resource.vmInfoFromProc
         0     0%  3.59%     0.50MB  0.29%  github.com/sustainable-computing-io/kepler/internal/resource.newProcess
         0     0%  3.59%    -0.50MB  0.29%  github.com/sustainable-computing-io/kepler/internal/device.(*AggregatedZone).Energy
   -0.50MB  0.29%  3.31%    -0.50MB  0.29%  github.com/sustainable-computing-io/kepler/internal/device.sysfsRaplZone.Energy
         0     0%  3.31%    -0.50MB  0.29%  github.com/sustainable-computing-io/kepler/internal/monitor.(*PowerMonitor).calculateNodePower
    0.50MB  0.29%  3.59%     0.50MB  0.29%  github.com/sustainable-computing-io/kepler/internal/resource.(*procWrapper).Cgroups
         0     0%  3.59%     0.50MB  0.29%  github.com/sustainable-computing-io/kepler/internal/exporter/prometheus/collector.(*PlatformCollector).Collect
         0     0%  3.59%     0.50MB  0.29%  github.com/sustainable-computing-io/kepler/internal/platform/redfish.(*PowerReader).ReadAll
    0.50MB  0.29%  3.88%     0.50MB  0.29%  github.com/sustainable-computing-io/kepler/internal/platform/redfish.(*PowerReader).readPowerSubsystem
         0     0%  3.88%     0.50MB  0.29%  github.com/sustainable-computing-io/kepler/internal/platform/redfish.(*Service).Power

⬇️ Download the Profiling artifacts from the Actions Summary page

📦 Artifact name: profile-artifacts-2369

🔧 Or use GitHub CLI to download artifacts:

gh run download 20168689157 -n profile-artifacts-2369

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat A new feature or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant