Summary
ENSApi crashes permanently (process.exit(1)) at startup if it initializes before ENSIndexer has published its ensnode.metadata, instead of waiting/retrying. This is a startup-ordering race: any deployment where ENSApi can come up before the indexer's first metadata write (fresh DB, co-located stacks, restarts) hard-fails the API.
Symptom
ERROR: Error initializing DI container
DI container initialization failed: could not connect to ENS Root Chain RPC
due to relation "ensnode.metadata" does not exist
ELIFECYCLE Command failed with exit code 1.
ENSApi exits and does not recover on its own; only a restart after the indexer has written metadata succeeds.
Root cause
apps/ensapi/src/index.ts runs di.init().catch(() => process.exit(1)) — no retry.
di.init() (apps/ensapi/src/di.ts) eagerly reads the indexer's published ensnode.metadata via stackInfoCache.read() / indexingStatusCache.read() and the root-chain RPC config derived from it. When ensnode.metadata doesn't exist yet, these throw and init rejects.
So a transient, expected startup-ordering condition (indexer hasn't published metadata yet) is treated as a fatal error.
Proposed fix
Treat "indexer metadata not published yet" as a retryable condition during DI init:
- Add a classifier
isIndexerMetadataNotReadyError(err) matching the missing-relation / empty-metadata case.
- In
di.init(), wrap the metadata-dependent initialization (the three cache reads + root-chain RPC config/getBlockNumber) in a bounded retry-with-backoff loop. On a "not ready" error: log and retry with backoff; on any other error: fail fast; give up after a configurable timeout (env, default ~10 min) so a genuine misconfig still surfaces. Ensure caches re-query on retry rather than caching the failure.
This makes ENSApi tolerant of being started before/alongside the indexer, which is the common case for fresh deployments and co-located dev/checkpoint stacks.
Notes
- Keep genuine config/connectivity failures fail-fast — only the metadata-not-ready class should retry.
- Found while running co-located indexer+ENSApi checkpoint stacks, where ENSApi reliably loses the race against the indexer's first metadata write and dies.
Part of #1360 (tracking).
Summary
ENSApi crashes permanently (
process.exit(1)) at startup if it initializes before ENSIndexer has published itsensnode.metadata, instead of waiting/retrying. This is a startup-ordering race: any deployment where ENSApi can come up before the indexer's first metadata write (fresh DB, co-located stacks, restarts) hard-fails the API.Symptom
ENSApi exits and does not recover on its own; only a restart after the indexer has written metadata succeeds.
Root cause
apps/ensapi/src/index.tsrunsdi.init().catch(() => process.exit(1))— no retry.di.init()(apps/ensapi/src/di.ts) eagerly reads the indexer's publishedensnode.metadataviastackInfoCache.read()/indexingStatusCache.read()and the root-chain RPC config derived from it. Whenensnode.metadatadoesn't exist yet, these throw and init rejects.So a transient, expected startup-ordering condition (indexer hasn't published metadata yet) is treated as a fatal error.
Proposed fix
Treat "indexer metadata not published yet" as a retryable condition during DI init:
isIndexerMetadataNotReadyError(err)matching the missing-relation / empty-metadata case.di.init(), wrap the metadata-dependent initialization (the three cache reads + root-chain RPC config/getBlockNumber) in a bounded retry-with-backoff loop. On a "not ready" error: log and retry with backoff; on any other error: fail fast; give up after a configurable timeout (env, default ~10 min) so a genuine misconfig still surfaces. Ensure caches re-query on retry rather than caching the failure.This makes ENSApi tolerant of being started before/alongside the indexer, which is the common case for fresh deployments and co-located dev/checkpoint stacks.
Notes
Part of #1360 (tracking).