Skip to content

S0naliThakur/blockchain-indexer

Repository files navigation

Fee Collector Event Indexer

A small, production-minded service that indexes LI.FI FeesCollected events from the on-chain FeeCollector contract, stores them in MongoDB (via Typegoose), and exposes a REST API to query them by integrator (and more).

The codebase is structured as a focused blockchain event indexer: chain event fetching → parsing/normalization → idempotent persistence → sync-progress tracking → reorg-aware ingestion → query API. It starts with Polygon but is structured so additional EVM chains plug in trivially, with a clean path to non-EVM chains later.

What it does

  • Scrapes FeesCollected events for a chain in bounded block chunks.
  • Can be (re)started at any time and resumes from saved progress — completed ranges are never rescanned.
  • Writes events idempotently (unique {chainKey, txHash, logIndex} index), so retries and reorg overlaps never create duplicates.
  • Stays confirmations blocks behind the chain head as a reorg guard.
  • Serves a small REST API: GET /health, POST /sync/:chainKey, GET /fees.
  • Graceful shutdown: stops the poller, lets the HTTP server drain, waits for any in-flight sync to settle, then closes the Mongo connection.

Architecture summary

   HTTP client ─▶ Fastify (server.ts):  /health   /sync/:chainKey   /fees
                                              │            │
                                              ▼            ▼
                                        ┌───────────┐   queryFees()
                                        │  Indexer  │   (events/fee-collected)
                                        │  engine   │        │
                                        └─────┬─────┘        │
                              drives Collectors│             │
                                              ▼              ▼
                            ┌──────────────────────┐   ┌──────────────────────┐
                            │  Collector (chain ×  │   │       MongoDB         │
                            │  event): latestHeight│   │ fee_collected_events  │
                            │  + fetch + store     │   │ sync_cursors          │
                            └──────────┬───────────┘   └──────────────────────┘
                   transport (per family) │  decode + store (per event)
                                          ▼
                            ┌──────────────────────┐
                            │ evmTransport (ethers)│   ← non-EVM transport later
                            └──────────┬───────────┘
                                       ▼
                                   EVM RPC node

The engine never sees ethers, an ABI, or an address — only Collectors. That is what keeps it identical across EVM and any non-EVM family.

  • server.ts — Fastify app, three routes, one error envelope. No business logic. Every AppError is logged (warn for 4xx, error for 5xx).
  • indexer.ts — the chain-agnostic engine: reads the cursor, scans in chunks up to head − confirmations, persists, then advances the cursor. Owns per-chain locking, recursive-setTimeout polling (ticks never overlap), a drain() for graceful shutdown, and a status() for /health. Plus the pure range math (targetHeight, nextChunk).
  • chains/types.ts defines the Collector seam + defineCollector glue; evm.ts is the only ethers code (a transport for any EVM contract/event). JsonRpcProvider is cached per rpcUrl so multiple events on the same chain share a connection. A non-EVM family is a sibling file here.
  • events/index.ts is the registry; fee-collected.ts is one event end-to-end (normalized doc + EVM decode + idempotent bulkWrite store + zod-validated keyset query + deployments).
  • db/connection.ts (connect + index creation) and schemas/*.schema.ts (one Typegoose model per file).
  • config/env.ts (zod-validated env) and chains.ts (chains table).

Setup

Requirements: Node ≥ 18 and a MongoDB instance (or use Docker, below).

npm install
cp .env.example .env   # then edit values as needed

Environment variables

Variable Default Description
PORT 3000 HTTP port.
LOG_LEVEL info tracefatal.
LOG_PRETTY true Pretty logs locally; set false for JSON in prod.
MONGO_URI — (required) MongoDB connection string.
MONGO_DB_NAME fee_indexer Database name.
SYNC_ON_STARTUP false Run one catch-up for all chains on boot.
SYNC_POLL_INTERVAL_MS 0 If > 0, re-sync all chains every N ms; ticks are scheduled after each run, so they never overlap. 0 = manual only.
POLYGON_RPC_URL https://1rpc.io/matic RPC endpoint. Public RPCs are rate-limited and often disallow deep eth_getLogs; use a dedicated provider (Alchemy/Infura/QuickNode) for full back-fills.
POLYGON_CONFIRMATIONS 128 Blocks to stay behind head (reorg guard).
POLYGON_CHUNK_SIZE 50 Max block span per log query.
POLYGON_FEE_COLLECTOR_ADDRESS (defaults in code) Override the contract address — useful for testnet/fork/redeploy.
POLYGON_FEE_COLLECTOR_START_BLOCK (defaults in code) Override the start block — useful to shorten back-fills for quick verification.

The FeeCollector address (0xbD6C…0eA9) and default start block (78600000) live with the event in src/config/fee-collected.ts; the env vars above override them at runtime.

Running it

There are two supported ways to run the service. Pick one — they both bind port 3000, so they will conflict if started together.

A) Local dev (recommended while iterating on the code)

Run Mongo in Docker, run the indexer on the host with tsx watch.

cp .env.example .env       # one-time
docker compose up -d mongo  # Mongo on localhost:27017, data in the mongo-data volume
npm run dev                 # or: npm run build && npm start

The API is on http://localhost:3000. Code changes hot-reload via tsx watch. The Mongo container survives between sessions; docker compose down (without -v) stops it without dropping data, and docker compose up -d mongo brings it back.

B) Full stack in Docker (closer to prod)

cp .env.example .env       # one-time
docker compose up --build  # builds the indexer image and starts both services

The compose file overrides MONGO_URI to point at the mongo service inside the Docker network, so the value in .env is only used by path A.

To pick up an env change you need to recreate the container, not just rebuild — the env is baked in at container start:

docker compose up -d --force-recreate indexer

Stopping things

docker compose down         # stop services, keep the Mongo volume
docker compose down -v      # stop services AND wipe Mongo data

Endpoints

GET /health

{
  "status": "ok",
  "uptimeSeconds": 273,
  "dependencies": { "mongo": "connected" },
  "sync": [
    {
      "key": "polygon:fee-collected",
      "lastSyncedAt": "2026-05-28T11:13:04.512Z",
      "lastScannedBlock": 87459_999,
      "lastObservedHead": 87460_127
    }
  ]
}

Returns 503 (still with the body above) when Mongo is not connected. An empty sync array means no cursor has advanced yet — either nothing was triggered, or the first run errored before any chunk was persisted.

POST /sync/:chainKey

Runs every event configured for that chain, in sequence.

  • 200 with one summary per event indexed on the chain.
  • 404 for an unknown chain.
  • 409 if a sync for that chain is already in progress (per-chain lock).
  • 502 (UPSTREAM_ERROR) when the upstream RPC fails; the original cause is preserved in details.cause.
{ "data": [ { "key": "polygon:fee-collected", "chainKey": "polygon",
  "eventKey": "fee-collected", "scanned": true, "fromBlock": 87440000,
  "toBlock": 87459872, "chunks": 10, "inserted": 42, "duplicates": 0 } ] }

Re-running continues from where it stopped. For unattended operation set SYNC_ON_STARTUP=true and/or SYNC_POLL_INTERVAL_MS=60000.

The very first Polygon sync from 78600000 covers a large range and will take a while against a public RPC (and most free tiers reject deep eth_getLogs outright). For a quick verification, point POLYGON_FEE_COLLECTOR_START_BLOCK close to head (e.g. ~10–20k blocks back). For real back-fills, use a dedicated provider.

GET /fees

# all events for an integrator
curl "http://localhost:3000/fees?integrator=0x1234...abcd"

# with chain + block range + pagination
curl "http://localhost:3000/fees?chainKey=polygon&fromBlock=87400000&toBlock=87460000&limit=50"

# next page — pass the returned cursor back
curl "http://localhost:3000/fees?integrator=0x1234...abcd&limit=50&cursor=Nzg2MDAxMjM6MA"

Filters: integrator (validated 20-byte hex), chainKey, fromBlock, toBlock, limit (≤200, default 50), cursor. Keyset pagination on (blockNumber, logIndex); limit + 1 is fetched to detect a further page without a count query. The cursor is opaque base64url. 400 on bad params.

{
  "data": [
    {
      "chainKey": "polygon",
      "contractAddress": "0xbd6c…0ea9",
      "token": "0x…",
      "integrator": "0x…",
      "integratorFee": "1000000000000000000",
      "lifiFee": "250000000000000000",
      "blockNumber": 87440123,
      "blockHash": "0x…",
      "txHash": "0x…",
      "logIndex": 0
    }
  ],
  "pageInfo": { "limit": 50, "hasMore": true, "nextCursor": "Nzg2MDAxMjM6MA" }
}

Error envelope

{ "error": { "code": "VALIDATION_ERROR", "message": "Invalid query parameters", "details": [ ] } }

Codes: VALIDATION_ERROR (400), NOT_FOUND (404), CONFLICT (409), UPSTREAM_ERROR (502), INTERNAL_ERROR (500).

Extending

Add an EVM chain — append an entry to the chains table in src/config/chains.ts, then add a deployment (address + start block) to each event in src/config/fee-collected.ts that runs there. The generic evmTransport works for any EVM chain.

Add an event — create src/events/<name>.ts exporting a buildCollectors (normalized doc + decode + idempotent store + deployments), add its Typegoose schema under src/db/schemas/, and register it in src/events/index.ts. The engine, cursor tracking, /sync and index creation pick it up automatically.

Because each event declares its own deployments, different events can run on different chains: an event is indexed only on the chains it lists (and that are configured). Event A → [polygon], event B → [optimism] is just two deployment lists.

Add a non-EVM chain — add a new family to ChainFamily, a transport file under src/chains/ implementing ChainTransport, a per-event decode for that family, and a matching case in the event's buildCollectors. The engine, storage, query, and cursor are untouched — the engine drives Collectors and never sees a chain SDK.

Tests

npm test            # run everything once
  • UnitdecodeEvm (ABI decoding, address lower-casing, uint256 precision); pure range math (targetHeight, nextChunk). No mocks.
  • Integrationstore idempotency + queryFees filters/pagination against a real in-memory Mongo; the engine driven by a fake Collector (only the fetch boundary is mocked — store, cursor, Mongo, unique-index dedup, resume across runs are all real); HTTP via Fastify inject() against the same real Mongo.

The only thing mocked is the fetch boundary (external, non-deterministic). Mongo stays real because the guarantees under test — unique-index idempotency, keyset pagination behaviour, resume-from-cursor — only hold against a real DB.

Tradeoffs & future improvements

Intentionally not added

  • No queue / Kafka / Redis — a single deployable with an in-process loop is simpler and sufficient at this scope.
  • No multi-replica coordination — the per-chain lock is in-process.
  • No active reorg rewind — confirmation depth + idempotent upserts cover realistic cases.
  • No block-timestamp enrichment (would add an extra RPC per block); trivial to add inside the decode if needed.
  • No API auth, CORS, or rate limiting — out of scope; a real deployment would sit behind a gateway that handles those.

What I'd add next in production

  • A distributed lock (Mongo lease) so multiple replicas can run safely.
  • Per-chunk retry with exponential backoff + jitter on transient RPC failures, with a small re-scan overlap to absorb shallow reorgs.
  • Active reorg handling: track block hashes and rewind the cursor on a parent-hash mismatch inside the confirmation window.
  • Metrics (Prometheus): blocks behind head, events/min, RPC error rate, sync duration per chain.
  • A background-job model for very large back-fills (so POST /sync returns 202 Accepted and the run progresses asynchronously).
  • A non-EVM transport + decode actually wired through, to exercise the extension point on a real chain.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors