|
| 1 | +# Sourcify Database |
| 2 | + |
| 3 | +Sourcify Database is the main storage backend for Sourcify. It is a PostgreSQL database that follows the [Verified Alliance Schema](https://github.com/verifier-alliance/database-specs) as its base with few modifications. |
| 4 | + |
| 5 | +On a high level, these modifications are: |
| 6 | +- Sourcify DB does accept contracts without the deployment details such as `block_number`, `transaction_hash` as well as without an onchain creation bytecode (`contracts.creation_code_hash`). |
| 7 | +- Stores the Solidity metadata separately in the `sourcify_matches` table. |
| 8 | +- Introduces tables for other purposes. |
| 9 | + |
| 10 | +You can follow the [`services/database/migrations`](https://github.com/ethereum/sourcify/tree/staging/services/database/migrations) folder for the initial schema and the changes made to it. These are not necessarily the differences between Sourcify DB and the Verified Alliance Schema, but any changes made to the schema over time. |
| 11 | + |
| 12 | +## Schema |
| 13 | + |
| 14 | +You can access the live schema of the database [here](https://dbdiagram.io/d/Sourcify-DB-66e1a0076dde7f4149c77e3a) or in the embedded frame below. |
| 15 | + |
| 16 | +<iframe src='https://dbdiagram.io/e/66e1a0076dde7f4149c77e3a/66e1a0196dde7f4149c78072' style={{width: "100%", height: "500px"}}> </iframe> |
| 17 | + |
| 18 | +In short: |
| 19 | +- Every verified contract is a coupling between a deployed contract (`contract_deployments`) and a compilation (`compiled_contracts`) |
| 20 | +- "Transformations" are applied to reach the final matching onchain bytecode from a bytecode from a compilation. |
| 21 | +- Contract bytecodes are "normalized" for deduplication. A bytecode of a popular contract like `ERC20.sol` will only be stored once. |
| 22 | + |
| 23 | +For more information about the schemas of the json fields below check the [Verifier Alliance repo](https://github.com/verifier-alliance/database-specs/tree/master/json-schemas). |
| 24 | + |
| 25 | +JSON fields of `verified_contracts` table: |
| 26 | +- `creation_values` |
| 27 | +- `creation_transformations` |
| 28 | +- `runtime_values` |
| 29 | +- `runtime_transformations` |
| 30 | + |
| 31 | +The transformations and values are the operations done on a bytecode from a compilation to reach the final matching onchain bytecode. |
| 32 | + |
| 33 | +JSON fields of `compiled_contracts` table: |
| 34 | +- `sources`: Source code files of a contract |
| 35 | +- `compiler_settings` |
| 36 | +- `compilation_artifacts`: Fields from the compilation output JSON. Fields: `abi`, `userdoc`, `devdoc`, `sources` (AST identifiers), `storageLayout` |
| 37 | +- `creation_code_artifacts`: Fields under `evm.bytecode` field. Fields: `sourceMap`, `linkReferences`, `cborAuxdata` |
| 38 | +- `runtime_code_artifacts`: Fields under `evm.deployedBytecode` field. Fields: `sourceMap`, `linkReferences`, `cborAuxdata`, `immutableReferences` |
| 39 | + |
| 40 | +## Download |
| 41 | + |
| 42 | +We dump the whole database daily in [Parquet](https://en.wikipedia.org/wiki/Apache_Parquet) format and upload it to a Cloudflare R2 storage. You can access the manifest file at https://export.sourcify.dev ( `.dev` redirects to `.app` domain, which also belongs to Sourcify). The script that does the dump is at [sourcifyeth/parquet-export](https://github.com/sourcifyeth/parquet-export). |
| 43 | + |
| 44 | + |
| 45 | +[export.sourcify.dev](https://export.sourcify.dev) will redirect to a `manifest.json` file: |
| 46 | + |
| 47 | +<details> |
| 48 | +<summary>manifest.json</summary> |
| 49 | + |
| 50 | +```json |
| 51 | +{ |
| 52 | + "timestamp": 1726030203254, |
| 53 | + "dateStr": "2024-09-11T04:50:03.254904Z", |
| 54 | + "files": { |
| 55 | + "code": [ |
| 56 | + "code/code_0_100000.parquet", |
| 57 | + "code/code_100000_200000.parquet", |
| 58 | + ... |
| 59 | + "code/code_2700000_2800000.parquet" |
| 60 | + ], |
| 61 | + "contracts": [ |
| 62 | + "contracts/contracts_0_1000000.parquet", |
| 63 | + ... |
| 64 | + "contracts/contracts_4000000_5000000.parquet" |
| 65 | + ], |
| 66 | + "contract_deployments": [ |
| 67 | + "contract_deployments/contract_deployments_0_1000000.parquet", |
| 68 | + ... |
| 69 | + "contract_deployments/contract_deployments_5000000_6000000.parquet" |
| 70 | + ], |
| 71 | + "compiled_contracts": [ |
| 72 | + "compiled_contracts/compiled_contracts_0_5000.parquet", |
| 73 | + ... |
| 74 | + "compiled_contracts/compiled_contracts_815000_820000.parquet" |
| 75 | + ], |
| 76 | + "verified_contracts": [ |
| 77 | + "verified_contracts/verified_contracts_0_1000000.parquet", |
| 78 | + ... |
| 79 | + "verified_contracts/verified_contracts_5000000_6000000.parquet" |
| 80 | + ], |
| 81 | + "sourcify_matches": [ |
| 82 | + "sourcify_matches/sourcify_matches_0_100000.parquet", |
| 83 | + ... |
| 84 | + "sourcify_matches/sourcify_matches_5300000_5400000.parquet" |
| 85 | + ] |
| 86 | + } |
| 87 | +} |
| 88 | +``` |
| 89 | +</details> |
| 90 | + |
| 91 | +You can download all the files and use a parquet client to query, inspect, or process the data. |
| 92 | + |
| 93 | +1. Download the manifest file (`-L` to follow redirects): |
| 94 | + ```bash |
| 95 | + curl -L -O https://export.sourcify.dev/manifest.json |
| 96 | + ``` |
| 97 | + |
| 98 | +2. Download all the tables listed in the manifest: |
| 99 | + ```bash |
| 100 | + jq -r '.files | keys[] as $k | .[$k][]' manifest.json | xargs -I {} curl -L -O https://export.sourcify.dev/{} |
| 101 | + ``` |
| 102 | + |
| 103 | +For example you can install the [`parquet-cli`](https://github.com/apache/parquet-java/blob/master/parquet-cli/README.md) to do basic inspection: |
| 104 | + |
| 105 | +```bash |
| 106 | +brew install parquet-cli |
| 107 | + |
| 108 | +parquet meta compiled_contracts_0_5000.parquet |
| 109 | +``` |
| 110 | + |
| 111 | +alternatively use your favorite data processing tool or import this data into a database. |
0 commit comments