Skip to content

Commit 34d39a1

Browse files
Add detailed repo docs and download instructions (#25)
* Add detailed repo docs and download instructions * Fix timestamp * Name the fields inside JSON objects of the DB * Incorporate PR comments * Fix typo Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent bcb8b12 commit 34d39a1

9 files changed

+356
-197
lines changed
+111
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Sourcify Database
2+
3+
Sourcify Database is the main storage backend for Sourcify. It is a PostgreSQL database that follows the [Verified Alliance Schema](https://github.com/verifier-alliance/database-specs) as its base with few modifications.
4+
5+
On a high level, these modifications are:
6+
- Sourcify DB does accept contracts without the deployment details such as `block_number`, `transaction_hash` as well as without an onchain creation bytecode (`contracts.creation_code_hash`).
7+
- Stores the Solidity metadata separately in the `sourcify_matches` table.
8+
- Introduces tables for other purposes.
9+
10+
You can follow the [`services/database/migrations`](https://github.com/ethereum/sourcify/tree/staging/services/database/migrations) folder for the initial schema and the changes made to it. These are not necessarily the differences between Sourcify DB and the Verified Alliance Schema, but any changes made to the schema over time.
11+
12+
## Schema
13+
14+
You can access the live schema of the database [here](https://dbdiagram.io/d/Sourcify-DB-66e1a0076dde7f4149c77e3a) or in the embedded frame below.
15+
16+
<iframe src='https://dbdiagram.io/e/66e1a0076dde7f4149c77e3a/66e1a0196dde7f4149c78072' style={{width: "100%", height: "500px"}}> </iframe>
17+
18+
In short:
19+
- Every verified contract is a coupling between a deployed contract (`contract_deployments`) and a compilation (`compiled_contracts`)
20+
- "Transformations" are applied to reach the final matching onchain bytecode from a bytecode from a compilation.
21+
- Contract bytecodes are "normalized" for deduplication. A bytecode of a popular contract like `ERC20.sol` will only be stored once.
22+
23+
For more information about the schemas of the json fields below check the [Verifier Alliance repo](https://github.com/verifier-alliance/database-specs/tree/master/json-schemas).
24+
25+
JSON fields of `verified_contracts` table:
26+
- `creation_values`
27+
- `creation_transformations`
28+
- `runtime_values`
29+
- `runtime_transformations`
30+
31+
The transformations and values are the operations done on a bytecode from a compilation to reach the final matching onchain bytecode.
32+
33+
JSON fields of `compiled_contracts` table:
34+
- `sources`: Source code files of a contract
35+
- `compiler_settings`
36+
- `compilation_artifacts`: Fields from the compilation output JSON. Fields: `abi`, `userdoc`, `devdoc`, `sources` (AST identifiers), `storageLayout`
37+
- `creation_code_artifacts`: Fields under `evm.bytecode` field. Fields: `sourceMap`, `linkReferences`, `cborAuxdata`
38+
- `runtime_code_artifacts`: Fields under `evm.deployedBytecode` field. Fields: `sourceMap`, `linkReferences`, `cborAuxdata`, `immutableReferences`
39+
40+
## Download
41+
42+
We dump the whole database daily in [Parquet](https://en.wikipedia.org/wiki/Apache_Parquet) format and upload it to a Cloudflare R2 storage. You can access the manifest file at https://export.sourcify.dev ( `.dev` redirects to `.app` domain, which also belongs to Sourcify). The script that does the dump is at [sourcifyeth/parquet-export](https://github.com/sourcifyeth/parquet-export).
43+
44+
45+
[export.sourcify.dev](https://export.sourcify.dev) will redirect to a `manifest.json` file:
46+
47+
<details>
48+
<summary>manifest.json</summary>
49+
50+
```json
51+
{
52+
"timestamp": 1726030203254,
53+
"dateStr": "2024-09-11T04:50:03.254904Z",
54+
"files": {
55+
"code": [
56+
"code/code_0_100000.parquet",
57+
"code/code_100000_200000.parquet",
58+
...
59+
"code/code_2700000_2800000.parquet"
60+
],
61+
"contracts": [
62+
"contracts/contracts_0_1000000.parquet",
63+
...
64+
"contracts/contracts_4000000_5000000.parquet"
65+
],
66+
"contract_deployments": [
67+
"contract_deployments/contract_deployments_0_1000000.parquet",
68+
...
69+
"contract_deployments/contract_deployments_5000000_6000000.parquet"
70+
],
71+
"compiled_contracts": [
72+
"compiled_contracts/compiled_contracts_0_5000.parquet",
73+
...
74+
"compiled_contracts/compiled_contracts_815000_820000.parquet"
75+
],
76+
"verified_contracts": [
77+
"verified_contracts/verified_contracts_0_1000000.parquet",
78+
...
79+
"verified_contracts/verified_contracts_5000000_6000000.parquet"
80+
],
81+
"sourcify_matches": [
82+
"sourcify_matches/sourcify_matches_0_100000.parquet",
83+
...
84+
"sourcify_matches/sourcify_matches_5300000_5400000.parquet"
85+
]
86+
}
87+
}
88+
```
89+
</details>
90+
91+
You can download all the files and use a parquet client to query, inspect, or process the data.
92+
93+
1. Download the manifest file (`-L` to follow redirects):
94+
```bash
95+
curl -L -O https://export.sourcify.dev/manifest.json
96+
```
97+
98+
2. Download all the tables listed in the manifest:
99+
```bash
100+
jq -r '.files | keys[] as $k | .[$k][]' manifest.json | xargs -I {} curl -L -O https://export.sourcify.dev/{}
101+
```
102+
103+
For example you can install the [`parquet-cli`](https://github.com/apache/parquet-java/blob/master/parquet-cli/README.md) to do basic inspection:
104+
105+
```bash
106+
brew install parquet-cli
107+
108+
parquet meta compiled_contracts_0_5000.parquet
109+
```
110+
111+
alternatively use your favorite data processing tool or import this data into a database.
+131
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
import TotalRepoSize from "./TotalRepoSize"
2+
3+
# File Repositories
4+
5+
This page describes the `RepositoryV1` and `RepositoryV2`, which are file systems. See [All Repositories](/docs/repository) for details.
6+
7+
## Table of Contents
8+
9+
- [RepositoryV1 vs RepositoryV2](#repositoryv1-vs-repositoryv2)
10+
- [RepositoryV1](#repositoryv1)
11+
- [RepositoryV2](#repositoryv2)
12+
- [Download](#download)
13+
14+
15+
## RepositoryV1 vs RepositoryV2
16+
17+
### RepositoryV1
18+
RepositoryV1 is the legacy storage backend for files. It is simply a file system based on how file paths are given in the [Solidity metadata](/docs/metadata). file.
19+
20+
An [example metadata](https://repo.sourcify.dev/contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/metadata.json) source file paths look like this for the "full_match" contract `0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D` on Ethereum Mainnet (1):
21+
```json
22+
{
23+
"sources": {
24+
"erc20/IERC20.sol": {
25+
"keccak256": "0xa38ec4e151e4d397d05bdfb94e6e4eb91e57a9fca3bc1c655289a4adf31a58fa",
26+
"license": "MIT",
27+
"urls": [
28+
"bzz-raw://312e850e36efbf0f2450896c213b23dc0a28150e051bcbf933a8b9211627c44b",
29+
"dweb:/ipfs/QmWsyisPjDwTJrTMhsGZa4JHiCS63mWfsyVQKbaijWGdmK"
30+
]
31+
},
32+
"erc20/airdrop.sol": {
33+
"keccak256": "0xea27a3e2c4179a064caf9fe9a198addd526fd1d1ea467ea474a0c069e6eac957",
34+
"urls": [
35+
"bzz-raw://6a86bc69b99876768bdbddba504410cf60b33681e1203a36d98840bf2ab8a42b",
36+
"dweb:/ipfs/QmRZSqNfAPduoPoUJ6BM4NpBTbTKBqg5Mz5YBNpaUz4TfQ"
37+
]
38+
}
39+
},
40+
}
41+
```
42+
43+
These files will be like below ([see in repo](https://repo.sourcify.dev/contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/)):
44+
```
45+
contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/
46+
├── metadata.json
47+
└── sources/
48+
└── erc20/
49+
├── IERC20.sol
50+
└── airdrop.sol
51+
```
52+
The problem with this is the part `"erc20/airdrop.sol"` is not necessarily a valid file path but a ["source unit name"](https://docs.soliditylang.org/en/v0.8.27/path-resolution.html#virtual-filesystem:~:text=assigned%20a%20unique-,source%20unit%20name,-which%20is%20an) in Solidity, i.e. arbitrary strings. This may cause issues on file systems as well as when pinning to IPFS.
53+
54+
### RepositoryV2
55+
56+
RepositoryV2 is the format where we normalize the file names with their keccak256 hashes (source files must have a `keccak256` field in the metadata). So the example above would look like this:
57+
58+
```
59+
contracts/full_match/1/0x801f3983c7baBF5E6ae192c84E1257844aDb4b4D/
60+
├── metadata.json
61+
└── sources/
62+
└── 0xa38ec4e151e4d397d05bdfb94e6e4eb91e57a9fca3bc1c655289a4adf31a58fa.sol
63+
└── 0xea27a3e2c4179a064caf9fe9a198addd526fd1d1ea467ea474a0c069e6eac957.sol
64+
```
65+
66+
The files are exactly the same so their IPFS hashes will not change, and you can look up the metadata file to find the original path-alike source unit names.
67+
68+
## IPFS
69+
70+
Unfortunatelly publishing under IPNS is temporarily disabled. This is because of the difficulty of managing the whole filesystem over IPFS (with MFS etc.) and updating the IPNS regularly.
71+
72+
We still pin all the files on IPFS so you can access them over their individual CIDs (e.g. [`QmVij3h9z536ZG5cRpUmTfdoN9KR1Xp4ix2P7to9dPHgE5`](https://ipfs.io/ipfs/QmVij3h9z536ZG5cRpUmTfdoN9KR1Xp4ix2P7to9dPHgE5)).
73+
74+
Look at the [Download section](#download) to learn how to download the whole repository.
75+
76+
## Web
77+
78+
Moved to [repo.sourcify.dev](/docs/repository/repo.sourcify.dev).
79+
80+
## Download
81+
82+
We compress the **RepositoryV2** weekly and publish on Cloudflare R2 under https://repo-backup.sourcify.dev ( `.dev` redirects to `.app` domain, which also belongs to Sourcify).
83+
84+
<TotalRepoSize/>
85+
86+
[repo-backup.sourcify.dev](https://repo-backup.sourcify.dev) will redirect to a `manifest.json` file:
87+
88+
<details>
89+
<summary>manifest.json</summary>
90+
91+
```json
92+
{
93+
"description": "Manifest file for when the Sourcify file repository was uploaded",
94+
"timestamp": 1726030203254,
95+
"dateStr": "2024-09-11T04:50:03.254904Z",
96+
"files": [
97+
{
98+
"path": "sourcify-repository-2024-09-10T13-36-47/sourcify-repository-2024-09-10T13-36-47.part.gz.aa",
99+
"sizeInBytes": 2097152000
100+
},
101+
...
102+
{
103+
"path": "sourcify-repository-2024-09-10T13-36-47/sourcify-repository-2024-09-10T13-36-47.part.gz.ap",
104+
"sizeInBytes": 800472503
105+
}
106+
]
107+
}
108+
```
109+
</details>
110+
111+
You can download all files in the `files` array and unzip them:
112+
113+
1. Download the manifest file (`-L` to follow redirects):
114+
```bash
115+
curl -L -O https://repo-backup.sourcify.dev/manifest.json
116+
```
117+
118+
2. Extract file paths and download each file:
119+
```bash
120+
jq -r '.files[].path' manifest.json | xargs -I {} curl -L -O https://repo-backup.sourcify.dev/{}
121+
```
122+
123+
3. Concatenate the downloaded parts:
124+
```bash
125+
cat sourcify-repository-*.part.gz.* > sourcify-repository.gz
126+
```
127+
128+
4. Unzip the concatenated file:
129+
```bash
130+
tar -xvzf sourcify-repository.gz
131+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
2+
# repo.sourcify.dev
3+
4+
[repo.sourcify.dev](https://repo.sourcify.dev) is an interface to the Sourcify contract file repository `RepositoryV1`.
5+
6+
The code is available at [sourcifyeth/h5ai-nginx](https://github.com/sourcifyeth/h5ai-nginx). For performance reasons, it is not possible to navigate the folders above the contract level. You need to know ahead the contract you are looking for.
7+
:::tip Lookup
8+
9+
Instead of entering the chain, you can check an address over all chains at https://sourcify.dev/#/lookup
10+
:::
11+
12+
The contracts are accessible under the following path format:
13+
14+
```
15+
https://repo.sourcify.dev/contracts/:match/:chainId/:contractAddress
16+
```
17+
18+
- `:match`: either `full_match` or `partial_match`
19+
- `:chainId`: EVM chain id `1` for Ethereum Mainnet, `5` Ethereum testnet Görli etc. See [chainlist.org](https://chainlist.org)
20+
- `:contractAddress`: e.g. `0x5ed4a410A612F2fe625a8F3cB4d70f197fF8C8be`
21+
22+
### Examples
23+
24+
Here are some example contracts:
25+
26+
- https://repo.sourcify.dev/contracts/full_match/1/0x5ed4a410A612F2fe625a8F3cB4d70f197fF8C8be
27+
- https://repo.sourcify.dev/contracts/full_match/1/0xca2ad74003502af6B727e846Fab40D6cb8Da0035
28+
- https://repo.sourcify.dev/contracts/full_match/100/0x4f15a6e74CFC2F80D5967a8aB75F3c83D8043cF4
29+
- https://repo.sourcify.dev/contracts/partial_match/1/0xb857F1f4014A0C45C287667148417b6799Fe594E/
30+
- (staging) https://repo.staging.sourcify.dev/contracts/partial_match/69/0xb50cBeeFBCE78cDe83F184B275b5E80c4f01006A/sources/
31+
32+
### View Source Code in Remix IDE
33+
34+
It is possible to view the contract folder in the Remix IDE by clicking "View in Remix".
35+
36+
Allow the Sourcify plugin on the next screen in Remix IDE (might take a while to load). The contract folder will be available under `verified-sources/<contract-address>` in the Remix file explorer.
37+
38+
![Sourcify repository screenshot](/img/sourcify-repo.png)
39+
40+
### Download folders
41+
42+
You can download the whole folder by clicking on top left download icon.
43+
44+
Alternatively you can select which files/folders to download by clicking the checkmarks, and click the download icon.
45+
46+
![Sourcify repository screenshot](/img/sourcify-repo-download.png)

docs/4. repository/TotalRepoSize.jsx

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
import React, { useState, useEffect } from "react";
2+
import LoadingOverlay from "../../src/components/LoadingOverlay";
3+
4+
const RepositoryStats = () => {
5+
const [isLoading, setIsLoading] = useState(true);
6+
const [totalSize, setTotalSize] = useState(0);
7+
const [timestamp, setTimestamp] = useState("");
8+
9+
useEffect(() => {
10+
const fetchStats = async () => {
11+
const manifestUrl = "https://repo-backup.sourcify.app/manifest.json";
12+
13+
try {
14+
const manifestResponse = await fetch(manifestUrl);
15+
const manifestData = await manifestResponse.json();
16+
17+
const totalSizeBytes = manifestData.files.reduce((acc, file) => acc + file.sizeInBytes, 0);
18+
const totalSizeGB = totalSizeBytes / (1024 * 1024 * 1024); // Convert to GB
19+
20+
setTotalSize(totalSizeGB);
21+
22+
const date = new Date(manifestData.timestamp);
23+
const formattedDate = date
24+
.toUTCString()
25+
.replace(/^[A-Za-z]+, /, "")
26+
.replace(/:\d{2} /, " ");
27+
setTimestamp(formattedDate);
28+
} catch (error) {
29+
console.error("Error fetching manifest:", error);
30+
} finally {
31+
setIsLoading(false);
32+
}
33+
};
34+
35+
fetchStats();
36+
}, []);
37+
38+
if (isLoading) {
39+
return <LoadingOverlay message="Calculating the repository size..." />;
40+
}
41+
42+
return (
43+
<div>
44+
<p>
45+
As of {timestamp} the <strong>compressed</strong> size of the repository files is:{" "}
46+
<strong>{totalSize.toFixed(2)} GB</strong>
47+
</p>
48+
</div>
49+
);
50+
};
51+
52+
export default RepositoryStats;

docs/4. repository/index.mdx

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Contract Repository
2+
3+
Sourcify stores the contracts in multiple storage backends and gives the option to choose which one to use. In short there are the following options:
4+
5+
- `RepositoryV1`
6+
- `RepositoryV2`
7+
- `SourcifyDatabase`
8+
- `AllianceDatabase`
9+
10+
For details see [Choosing the storage backend](https://github.com/ethereum/sourcify/tree/staging/services/server#choosing-the-storage-backend).
11+
12+
## Download
13+
14+
You can download the whole contract file repository in zips or the Sourcify database in Parquet format. Follow the guides in each page:
15+
- [Download RepositoryV2](/docs/repository/file-repositories/#download)
16+
- [Download SourcifyDatabase](/docs/repository/sourcify-database/#download)
File renamed without changes.

0 commit comments

Comments
 (0)