Skip to content

Option to exclude per-CPU metrics from scraping #1876

@orloffv

Description

@orloffv

Problem

The ClickHouse Operator collects metrics from ClickHouse nodes including per-CPU system metrics like:

metric.OSGuestTimeCPU{N}
metric.OSIOWaitTimeCPU{N}
metric.OSUserTimeCPU{N}
metric.CPUFrequencyMHz_{N}
etc.

On machines with high core counts, this generates an enormous amount of metrics that provide limited value.

Our Setup

15 shards × 2 replicas = 30 ClickHouse nodes
CPUs with ~380 cores per node
Each node exports ~5,000 CPU metrics
Total: ~150,000 CPU metrics across the cluster

Impact

/metrics endpoint response time: ~8 seconds
/metrics response size: ~40 MB
~95% of all metrics are these per-CPU metrics

Why These Metrics Are Less Useful

As ClickHouse documentation states:

This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.

These metrics reflect the entire host, not ClickHouse specifically, making them less actionable for ClickHouse monitoring.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    planned for reviewThis feature is planned for reviewresearch requiredThis issue requires additional research

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions