-
Notifications
You must be signed in to change notification settings - Fork 499
Description
Description
When running a validator node with multiplexer mode enabled and telemetry.enabled = true in app.toml, the node fails to start with the error:
Error: duplicate metrics collector registration attempted
The child process (ABCI app) becomes a zombie and the parent process hangs indefinitely.
celestia-app version
v6.4.10-mocha (also likely affects other versions using multiplexer mode)
Network
mocha-4
Steps to Reproduce
- Configure a validator node with multiplexer mode (default for validators)
- Set
telemetry.enabled = trueinapp.toml - Start the node with:
celestia-appd start --home /home/celestia/.celestia-app --address=tcp://127.0.0.1:26658
- Observe the error in logs
Expected Behavior
The node should start normally with telemetry metrics available for Prometheus scraping.
Actual Behavior
- Child process logs
Error: duplicate metrics collector registration attemptedand exits - Child process becomes a zombie (
<defunct>) - Parent process hangs after
service start impl=PubSub - Node never syncs or produces blocks
Logs
INF initializing multiplexer app_version=3 chain_id=mocha-4 module=server
INF initialized remote app client address=127.0.0.1:26658 module=server
INF starting comet node module=server
INF Since the chainID is mocha-4, configuring the default v2 upgrade height to 2585031
INF starting ABCI without Tendermint
Error: duplicate metrics collector registration attempted
INF service start impl=multiAppConn module=proxy msg={}
INF service start connection=query impl=localClient module=abci-client msg={}
INF service start connection=snapshot impl=localClient module=abci-client msg={}
INF service start connection=mempool impl=localClient module=abci-client msg={}
INF service start connection=consensus impl=localClient module=abci-client msg={}
INF service start impl=EventBus module=events msg={}
INF service start impl=PubSub module=pubsub msg={}
[hangs here - no further progress]
Process state showing zombie:
ps aux | grep celestia-appd
celestia 2269545 19.3 0.5 ... /usr/local/bin/celestia-appd start ...
celestia 2269569 8.9 0.0 0 0 ? Z [celestia-appd] <defunct>
Root Cause Analysis
In multiplexer mode, celestia-appd spawns two processes:
- Parent: Runs CometBFT/consensus
- Child: Runs ABCI application with
--with-tendermint=false --transport=grpc
Both processes read the same app.toml and both call prometheus.MustRegister() from the Cosmos SDK telemetry package. Since Prometheus doesn't allow duplicate metric registration, the child process fails.
Workaround
Disable telemetry in app.toml:
[telemetry]
enabled = falseThis is not ideal as it prevents collecting application metrics for monitoring/Grafana dashboards.
Suggested Fix
Option A (Recommended): Skip telemetry initialization in the child process when --with-tendermint=false is passed.
Option B: Use different metric namespaces for parent (celestia_consensus_*) and child (celestia_app_*).
Option C: Only initialize telemetry in one of the processes and expose metrics via a shared endpoint.
Impact
- Validators cannot have telemetry enabled
- Operators cannot build Grafana dashboards with app-level metrics from validators
- This only affects nodes using multiplexer mode (validators); consensus-full nodes work fine
Environment
- OS: Ubuntu 24.04
- Architecture: x86_64
- Node type: Validator