Skip to content

Commit eda5758

Browse files
committed
readme
1 parent 152079d commit eda5758

File tree

2 files changed

+99
-43
lines changed

2 files changed

+99
-43
lines changed

README.md

Lines changed: 97 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,123 @@
1-
# 1. Set up the environment
1+
# Ethereum Blockchain Data Analytics Platform
22

3-
## Postgres with Docker
3+
Capstone project for [Foundry AI Academy](https://www.foundry.academy/) Data & AI Engineering program. An ELT pipeline for extracting, loading, and transforming Ethereum blockchain data with focus on stablecoin analytics.
44

5+
Inspired by [Visa on Chain Analytics](https://visaonchainanalytics.com/).
6+
7+
## Quick Start
8+
9+
### Prerequisites
510
```bash
11+
# Create Docker network
612
docker network create fa-dae2-capstone_kafka_network
13+
14+
# Start PostgreSQL
715
docker-compose up -d
8-
```
9-
## Python environment
10-
The project structured as a package in `src/capstone_package` directory. Runnable scripts are in `scripts` directory only.
1116

12-
Install dependencies using uv:
13-
```bash
17+
# Install dependencies
1418
uv sync
15-
```
1619

17-
## Initialize the database
20+
# Setup environment
21+
cp .env.example .env
22+
export $(cat .env | xargs)
23+
```
1824

19-
### Set environment variables
25+
### Extract Data
26+
```bash
27+
# Extract logs and transactions from Etherscan
28+
uv run python scripts/el/extract_etherscan.py \
29+
-c ethereum \
30+
-a 0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72 \
31+
--logs --transactions \
32+
--from_block 18.5M --to_block 20M \
33+
-v
34+
```
2035

21-
- Copy `.env.example` to `.env`
36+
### Load Data
2237
```bash
23-
cp .env.example .env
38+
# Load Parquet to PostgreSQL
39+
uv run python scripts/el/load.py \
40+
-f .data/raw/ethereum_0xaddress_logs_18500000_20000000.parquet \
41+
-c postgres \
42+
-s raw \
43+
-t logs \
44+
-w append
2445
```
25-
- Set environment variables
46+
47+
### Transform Data
2648
```bash
27-
export $(cat .env | xargs)
49+
# Run dbt models
50+
./scripts/dbt.sh run
51+
52+
# Run specific model
53+
./scripts/dbt.sh run --select stg_logs_decoded
2854
```
2955

30-
### The data
31-
- Log and transaction data of a smart contract [0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72](https://etherscan.io/address/0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72) on Ethereum.
32-
- Whole loading data in is parquet format
33-
- Example in json format:
34-
- [logs.json](data/ethereum_0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72/logs.json)
35-
- [transactions.json](data/ethereum_0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72/transactions.json)
56+
## Architecture
3657

37-
### Load data to Postgres
58+
**Extract****Load****Transform**
3859

39-
There are two ways to load data to Postgres:
60+
1. **Extract** (`scripts/el/extract_etherscan.py`): Pulls blockchain data from Etherscan API to `.data/raw/*.parquet`
61+
2. **Load** (`scripts/el/load.py`): Loads Parquet files into PostgreSQL/Snowflake `raw` schema
62+
3. **Transform** (`dbt_project/`): dbt models transform raw data into analytics-ready tables
4063

41-
1. Using DLT
42-
dlt will automatically create the table and load data to it.
43-
```bash
44-
python scripts/data_loading/postgres_loader.py
64+
### Project Structure
65+
```
66+
├── scripts/el/ # Extract & Load scripts
67+
├── src/onchaindata/ # Reusable Python package
68+
│ ├── data_extraction/ # Etherscan/GraphQL clients
69+
│ ├── data_pipeline/ # Loader classes
70+
│ └── utils/ # Database clients
71+
├── dbt_project/ # dbt transformation layer
72+
│ ├── models/01_staging/ # Raw data cleanup (views)
73+
│ ├── models/intermediate/# Business logic (ephemeral)
74+
│ └── models/marts/ # Analytics tables (tables)
75+
└── .data/raw/ # Extracted Parquet files
4576
```
46-
**Note**: for non-standard data types e.g. json, use [apply_hints](scripts/data_loading/postgres_loader.py#L28) to define the data type.
4777

48-
1. Without DLT, using psycopg to load data.
78+
## Key Features
79+
80+
- **Multi-chain support**: Ethereum, Polygon, BSC via chainid mapping
81+
- **Automatic retry**: Failed extractions retry with smaller chunks (10x reduction)
82+
- **Flexible loading**: PostgreSQL and Snowflake support
83+
- **Block number shortcuts**: Use `18.5M` instead of `18500000`
84+
- **dbt transformations**: Staging → Intermediate → Marts layers
85+
86+
## Environment Variables
87+
88+
Required (see `.env.example`):
89+
- `POSTGRES_*`: Database connection
90+
- `ETHERSCAN_API_KEY`: API access
91+
- `DB_SCHEMA`: Default schema
92+
93+
Optional (for Snowflake):
94+
- `SNOWFLAKE_*`: Snowflake connection details
95+
96+
## Common Commands
4997

50-
- Initialize the table manually
5198
```bash
52-
./scripts/sql/run_sql.sh ./scripts/sql/init.sql;
99+
# SQL operations
100+
./scripts/sql_pg.sh ./scripts/sql/init.sql
101+
102+
# dbt operations
103+
./scripts/dbt.sh test # Run tests
104+
./scripts/dbt.sh docs generate # Generate docs
105+
./scripts/dbt.sh run --select staging.* # Run staging models
106+
107+
# Extract with time range
108+
uv run python scripts/el/extract_etherscan.py \
109+
-a 0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72 \
110+
--logs --transactions \
111+
--last_n_days 7
53112
```
54113

55-
- Use `load_parquet_to_postgres_wo_dlt` function in [postgres_loader.py](scripts/data_loading/postgres_loader.py)
114+
## Database Schema
56115

57-
### Load data to Snowflake
116+
- `raw.logs`: Raw event logs with JSONB topics
117+
- `raw.transactions`: Transaction data
118+
- `staging.stg_logs_decoded`: Decoded logs with parsed topics (topic0-topic3)
119+
- Marts: Analytics tables created by dbt
58120

59-
1. raw data stored in `database/RAW_DATA.JSON_STAGE`
60-
Use `upload_file_to_stage` function in [snowflake_loader.py](scripts/data_loading/snowflake_loader.py) to upload the data to Snowflake.
61-
```bash
62-
python scripts/data_loading/snowflake_loading.py
63-
```
121+
## Documentation
122+
123+
For detailed documentation, see [CLAUDE.md](CLAUDE.md) or the [docs/](docs/) directory.

docs/02_data_pipeline/022_modeling.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
## Conceptual Model
22

33
### Entity Relationship Diagram
4-
<img src="https://github.com/newgnart/fa-dae2-stables-analytics/tree/main/docs/imgs/entities.drawio.svg" alt="Entity Relationship Diagram" width="600">
5-
6-
4+
<iframe src="../../assets/erd01.html" frameborder="0" width="70%" height="250px"></iframe>
75

86
### Entity Descriptions and Relationships
97
**STABLECOIN**
@@ -21,9 +19,7 @@
2119
- Wallet or contract address that holds/transacts stablecoins
2220
- Relationship: Address HOLDS Stablecoin
2321

24-
**LOAN**
25-
- Lending/borrowing activity involving stablecoins
26-
- Relationship: Address BORROWS/LENDS Stablecoin
22+
2723

2824

2925
## Logical Model

0 commit comments

Comments
 (0)