Skip to content

zookeeperreceiver: zookeeper.latency.avg not collected — strconv.ParseInt fails on float value from zk_avg_latency #47320

@yifeizhangsplunk

Description

@yifeizhangsplunk

Component(s)

receiver/zookeeper

Describe the issue you're reporting

The zookeeper.latency.avg metric is never emitted by the Zookeeper receiver even when explicitly enabled in the config.

Root cause: In scraper/zookeeperscraper/scraper.go line 149, all mntr output values are parsed using strconv.ParseInt. However, Zookeeper 3.7+ reports zk_avg_latency as a float (e.g., 0.0989, 0.1097). strconv.ParseInt silently fails and the metric is dropped via continue.

int64Val, err := strconv.ParseInt(metricValue, 10, 64)
if err != nil {
    z.logger.Debug(
        "non-integer value from "+mntrCommand,
        zap.String("value", metricValue),
    )
    continue  // metric is silently dropped
}

Affected metric: zookeeper.latency.avg (mapped from zk_avg_latency)
Confirmed Zookeeper versions: 3.7.2, 3.8.4

Other latency metrics (zookeeper.latency.min, zookeeper.latency.max) flow correctly because they report as integers.

Steps to Reproduce

  1. Deploy Zookeeper 3.7+ with 4lw.commands.whitelist=*
  2. Configure the Zookeeper receiver with zookeeper.latency.avg: enabled: true
  3. Run echo mntr | nc <zk-host> 2181 — observe zk_avg_latency is a float (e.g., zk_avg_latency 0.0989)
  4. Query zookeeper.latency.avg in your metrics backend — no data returned

Expected Result

zookeeper.latency.avg is collected and reported with the float value from zk_avg_latency.

Actual Result

zookeeper.latency.avg is never emitted. strconv.ParseInt fails on the float string, logs a DEBUG-level message only, and skips the metric. All other Zookeeper metrics are collected correctly.

Proposed fix: Use strconv.ParseFloat instead:

floatVal, err := strconv.ParseFloat(metricValue, 64)
if err != nil {
    z.logger.Debug(
        "non-parseable value from "+mntrCommand,
        zap.String("value", metricValue),
    )
    continue
}
recordDataPoints(now, int64(floatVal))

Collector version

v0.144.0 (Splunk distribution). Bug confirmed present in latest main branch (scraper/zookeeperscraper/scraper.go, last changed 2025-07-17).

Environment

OS: Linux (Kubernetes EKS nodes)
Zookeeper versions: 3.7.2 and 3.8.4 (both confirmed affected)
Deployment: OTel agent DaemonSet connecting to Zookeeper pods on port 2182

OpenTelemetry Collector configuration

receivers:
  zookeeper:
    endpoint: '<zk-host>:2181'
    collection_interval: 30s
    timeout: 10s
    metrics:
      zookeeper.latency.avg:
        enabled: true
      zookeeper.latency.min:
        enabled: true
      zookeeper.latency.max:
        enabled: true

exporters:
  signalfx:
    realm: test

service:
  pipelines:
    metrics:
      receivers: [zookeeper]
      exporters: [signalfx]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions