Skip to content

Telemetry causes "duplicate metrics collector registration" error in multiplexer mode #6601

@tty47

Description

@tty47

Description

When running a validator node with multiplexer mode enabled and telemetry.enabled = true in app.toml, the node fails to start with the error:

Error: duplicate metrics collector registration attempted

The child process (ABCI app) becomes a zombie and the parent process hangs indefinitely.

celestia-app version

v6.4.10-mocha (also likely affects other versions using multiplexer mode)

Network

mocha-4

Steps to Reproduce

  1. Configure a validator node with multiplexer mode (default for validators)
  2. Set telemetry.enabled = true in app.toml
  3. Start the node with:
    celestia-appd start --home /home/celestia/.celestia-app --address=tcp://127.0.0.1:26658
  4. Observe the error in logs

Expected Behavior

The node should start normally with telemetry metrics available for Prometheus scraping.

Actual Behavior

  • Child process logs Error: duplicate metrics collector registration attempted and exits
  • Child process becomes a zombie (<defunct>)
  • Parent process hangs after service start impl=PubSub
  • Node never syncs or produces blocks

Logs

INF initializing multiplexer app_version=3 chain_id=mocha-4 module=server
INF initialized remote app client address=127.0.0.1:26658 module=server
INF starting comet node module=server
INF Since the chainID is mocha-4, configuring the default v2 upgrade height to 2585031
INF starting ABCI without Tendermint
Error: duplicate metrics collector registration attempted
INF service start impl=multiAppConn module=proxy msg={}
INF service start connection=query impl=localClient module=abci-client msg={}
INF service start connection=snapshot impl=localClient module=abci-client msg={}
INF service start connection=mempool impl=localClient module=abci-client msg={}
INF service start connection=consensus impl=localClient module=abci-client msg={}
INF service start impl=EventBus module=events msg={}
INF service start impl=PubSub module=pubsub msg={}
[hangs here - no further progress]

Process state showing zombie:

ps aux | grep celestia-appd
celestia 2269545 19.3  0.5 ... /usr/local/bin/celestia-appd start ...
celestia 2269569  8.9  0.0      0     0 ?  Z  [celestia-appd] <defunct>

Root Cause Analysis

In multiplexer mode, celestia-appd spawns two processes:

  1. Parent: Runs CometBFT/consensus
  2. Child: Runs ABCI application with --with-tendermint=false --transport=grpc

Both processes read the same app.toml and both call prometheus.MustRegister() from the Cosmos SDK telemetry package. Since Prometheus doesn't allow duplicate metric registration, the child process fails.

Workaround

Disable telemetry in app.toml:

[telemetry]
enabled = false

This is not ideal as it prevents collecting application metrics for monitoring/Grafana dashboards.

Suggested Fix

Option A (Recommended): Skip telemetry initialization in the child process when --with-tendermint=false is passed.

Option B: Use different metric namespaces for parent (celestia_consensus_*) and child (celestia_app_*).

Option C: Only initialize telemetry in one of the processes and expose metrics via a shared endpoint.

Impact

  • Validators cannot have telemetry enabled
  • Operators cannot build Grafana dashboards with app-level metrics from validators
  • This only affects nodes using multiplexer mode (validators); consensus-full nodes work fine

Environment

  • OS: Ubuntu 24.04
  • Architecture: x86_64
  • Node type: Validator

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions