Skip to content

feat: fetch chain config from remote URL and remove HTML scraping#2709

Open
kuzdogan wants to merge 2 commits intostagingfrom
feat/remote-chain-config-v2
Open

feat: fetch chain config from remote URL and remove HTML scraping#2709
kuzdogan wants to merge 2 commits intostagingfrom
feat/remote-chain-config-v2

Conversation

@kuzdogan
Copy link
Copy Markdown
Member

Summary

This PR makes two related changes to how Sourcify manages chain configuration:

1. Remote chain config

Removes the bundled chains.json, sourcify-chains-default.json, and extra-chains.json
from the server, replacing them with a remote fetch at startup from
sourcifyeth/sourcify-chains.

  • Server now fetches sourcify-chains-default.json from the remote URL on startup
  • Local sourcify-chains.json override still supported for self-hosted instances
  • 3-attempt retry with 3s delay if the fetch fails
  • chains.remoteUrl in default.js points to the published chain list
  • Chain configuration is now maintained and auto-generated in a dedicated repo with
    nightly CI that pulls from QuickNode, dRPC, Etherscan, and Blockscout APIs

2. Remove blockscoutScrape and etherscanScrape

Removes both HTML-scraping methods for contract creation transaction lookup:

  • blockscoutScrape: all chains migrated to blockscoutApi, deprecated, or moved to
    additional-chains.json (Bitkub Chain, Velas, Ozone Chain — low activity, no v2 API)
  • etherscanScrape: defined in the type but never used in any chain configuration

With both removed, the entire scraping infrastructure is also deleted:
getScrapeContractCreationFetcher, getCreatorTxByScraping, regex constants.

Test changes

  • chain-tests.spec.ts: replaced static sourcify-chains-default.json import with async
    initializeSourcifyChains() + sourcifyChainsMap; uses Mocha --delay flag
  • ServerFixture: uses LOCAL_CHAINS as default chains so integration tests don't need a remote URL
  • contract-creation-util.spec.ts: removed two scrape tests (Blockscout old/new regex)

🤖 Generated with Claude Code

Removes bundled chain files (chains.json, extra-chains.json,
sourcify-chains-default.json) and replaces them with a remote fetch
from a URL configured via config.chains.remoteUrl. This decouples the
server from chain data, enabling the separate sourcifyeth/sourcify-chains
repo to own chain generation and trigger server restarts on updates.

Loading priority:
  1. src/sourcify-chains.json (self-hosted override, unchanged)
  2. fetch from config.chains.remoteUrl at startup (3 retries, 3s delay)

Changes:
- Remove chains.json, extra-chains.json, sourcify-chains-default.json
- Add config.chains.remoteUrl setting pointing to sourcifyeth/sourcify-chains
- Refactor sourcify-chains.ts: export async initializeSourcifyChains()
  and a mutable sourcifyChainsMap populated at startup; pass etherscanApi
  to SourcifyChain constructor so /chains and Etherscan key lookup work
- Wrap CLI startup in async IIFE; await initializeSourcifyChains() so
  getEtherscanApiKeyForEachChain() sees the populated chain map
- ServerFixture: use LOCAL_CHAINS as default so tests don't need remote URL
- chain-tests.spec.ts: replace static JSON import with async
  initializeSourcifyChains() + sourcifyChainsMap; use Mocha --delay
- Add --delay to test:chains mocha invocation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After switching to async chain loading, sourcifyChainsMap is empty until
initializeSourcifyChains() is called. Tests that access the map during
registration (it() description strings) or during test execution now
fail with "Cannot read properties of undefined".

Fixes:
- ServerFixture.before(): call initializeSourcifyChains() so all
  integration tests have the full chain map populated at runtime
- etherscan.spec.ts: replace sourcifyChainsMap[id].name in it()
  description strings with testChainId (map is empty at registration time)
- contract-creation-util.spec.ts: add before() calling
  initializeSourcifyChains() (test doesn't use ServerFixture)
- verificationWorker.spec.ts: same — add initializeSourcifyChains()
  before the sourcifyChainsMap-based stub setup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kuzdogan
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

Claude finished @kuzdogan's task in 3m 4s —— View job


PR Review

  • Gather context and understand the PR
  • Review sourcify-chains.ts (core logic)
  • Review cli.ts (startup initialization)
  • Review contract-creation-util.ts (scraping removal)
  • Review test changes
  • Review type changes
  • Post final review

Overall this is a well-motivated change — removing 57k lines of bundled JSON and the fragile HTML-scraping infrastructure is a clear win. The implementation is mostly clean but there are a few items worth addressing.

Issues

1. sourcifyChainsMap key type inconsistency (sourcify-chains.ts:293)

sourcifyChainsMap[chainId] = new SourcifyChain({ ... });  // chainId is number

vs. line 282:

sourcifyChainsMap[chain.chainId.toString()] = chain;  // explicit string

sourcifyChainsMap is typed as SourcifyChainMap with string keys. JS silently coerces the number, but this is inconsistent. The line should use sourcifyChainsMap[chainId.toString()] = ... to match the other path and be explicit. Fix this →

2. Unusual TypeScript non-null assertion pattern (sourcify-chains.ts:267)

if (!chainsExtensions!) {

The ! suffix is a non-null assertion that does nothing at runtime — it's a TypeScript-ism to suppress "used before assignment". This is valid but non-obvious. A cleaner pattern:

let chainsExtensions: SourcifyChainsExtensionsObjectWithHeaderEnvName | undefined;
// ... fetch loop ...
if (!chainsExtensions) {
  throw new Error(...);
}

Fix this →

3. Lost chain metadata from chains.json (potential breaking change)

The old code merged chains.json (ethereum-lists, ~1400 chains) as base data with sourcify-chains-default.json as extensions. The new SourcifyChain constructor call only passes:

new SourcifyChain({
  name: extension.sourcifyName,
  chainId,
  supported,
  rpcs,
  etherscanApi,
  fetchContractCreationTxUsing,
});

Fields like nativeCurrency, faucets, infoURL, shortName, network, networkId — previously sourced from chains.json — are no longer passed. If the new sourcify-chains-default.json in the remote repo doesn't include these, the /chains API response will return chains with missing metadata. Worth confirming the remote config includes all these fields, or verifying these fields aren't exposed in any API response.

4. PR description vs. actual ServerFixture behavior

The PR description says "ServerFixture uses LOCAL_CHAINS as default chains so integration tests don't need a remote URL" — but ServerFixture.ts:88 uses sourcifyChainsMap as the default, not LOCAL_CHAINS:

chains: fixtureOptions_?.chains || sourcifyChainsMap,

And initializeSourcifyChains() is called unconditionally in before() (line 69), which will attempt a remote fetch for every integration test suite that doesn't have a local sourcify-chains.json. This means integration tests require network access to raw.githubusercontent.com at startup — a change from the old statically-bundled behavior. Either the PR description needs updating or the default should actually be LOCAL_CHAINS.

5. No startup resilience / offline fallback

Previously the server could start offline (chains were bundled). Now if the remote URL is unreachable after 3 retries, the server exits with a fatal error. This changes the operational profile significantly. Worth documenting as a known trade-off, and potentially worth adding a config option like chains.fallbackToLocalOnError or caching the last-known-good config.

6. No integrity check on remote config

The fetched JSON is parsed and used directly. If the GitHub raw URL is ever served with unexpected content (CDN misconfiguration, compromised file, etc.), there's no schema validation before new SourcifyChain() is called for each entry. A basic sanity check (e.g., verify the parsed object is non-empty and has expected shape) would be defensive.


Minor notes

  • contract-creation-util.ts:182: if (fetcher === undefined)fetcher is typed as ContractCreationFetcher (non-optional) so this guard is now dead code after the callers were cleaned up.
  • The chainsExtensions variable is used in the logger.info at line 303 after the block that populates it. TypeScript should be fine but the scope is a bit wide — it's declared before an if/else with two very different branches. A refactor to return from each branch early and have a single populate step would be cleaner, but this is non-blocking.
  • package.json: Removing --exit from test:chains could let the test process hang if chains have open connections. Worth verifying Mocha's --delay and the fixture's after() hook cleanly close everything.

What's working well

  • The retry logic (3 attempts, 3s delay) is clean and appropriate for a startup network call.
  • Re-initialization support (for (const key of Object.keys(sourcifyChainsMap)) delete sourcifyChainsMap[key]) is good for testing.
  • LOCAL_CHAINS in non-production is a thoughtful touch.
  • Removing blockscoutScrape and etherscanScrape is the right call — the API-based approaches are far more reliable.
  • The --delay + run() pattern in chain-tests.spec.ts is the correct Mocha idiom for async test registration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

1 participant