|
1 | | -# Ethereum Blockchain Data Analytics Platform |
| 1 | +# Stables Analytics |
2 | 2 |
|
3 | | -Capstone project for [Foundry AI Academy](https://www.foundry.academy/) Data & AI Engineering program. An ELT pipeline for extracting, loading, and transforming Ethereum blockchain data with focus on stablecoin analytics. |
| 3 | +Production-grade ELT pipeline for on-chain stablecoin analytics. Built for [Foundry AI Academy](https://www.foundry.academy/) Data & AI Engineering program. |
4 | 4 |
|
5 | | -Inspired by [Visa on Chain Analytics](https://visaonchainanalytics.com/). |
| 5 | +**Pipeline**: `HyperSync GraphQL → PostgreSQL/Snowflake → dbt → Analytics` |
6 | 6 |
|
7 | 7 | ## Quick Start |
8 | 8 |
|
9 | | -### Prerequisites |
10 | 9 | ```bash |
11 | | -# Create Docker network |
12 | | -docker network create fa-dae2-capstone_kafka_network |
13 | | - |
14 | | -# Start PostgreSQL |
| 10 | +# Setup |
15 | 11 | docker-compose up -d |
16 | | - |
17 | | -# Install dependencies |
18 | 12 | uv sync |
19 | | - |
20 | | -# Setup environment |
21 | 13 | cp .env.example .env |
22 | | -export $(cat .env | xargs) |
23 | | -``` |
24 | 14 |
|
25 | | -### Extract Data |
26 | | -```bash |
27 | | -# Extract logs and transactions from Etherscan |
28 | | -uv run python scripts/el/extract_etherscan.py \ |
29 | | - -c ethereum \ |
30 | | - -a 0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72 \ |
31 | | - --logs --transactions \ |
32 | | - --from_block 18.5M --to_block 20M \ |
33 | | - -v |
34 | | -``` |
35 | | - |
36 | | -### Load Data |
37 | | -```bash |
38 | | -# Load Parquet to PostgreSQL |
39 | | -uv run python scripts/el/load.py \ |
40 | | - -f .data/raw/ethereum_0xaddress_logs_18500000_20000000.parquet \ |
41 | | - -c postgres \ |
42 | | - -s raw \ |
43 | | - -t logs \ |
44 | | - -w append |
45 | | -``` |
| 15 | +# Extract blockchain data via GraphQL |
| 16 | +# Run the indexer separately: https://github.com/newgnart/envio-stablecoins |
| 17 | +uv run python scripts/el/extract_graphql.py --from_block 18500000 --to_block 20000000 -v |
46 | 18 |
|
47 | | -### Transform Data |
48 | | -```bash |
49 | | -# Run dbt models |
50 | | -./scripts/dbt.sh run |
| 19 | +# Load to database |
| 20 | +uv run python scripts/el/load.py -f .data/raw/data_*.parquet -c postgres -s raw -t raw_transfer -w append |
51 | 21 |
|
52 | | -# Run specific model |
53 | | -./scripts/dbt.sh run --select stg_logs_decoded |
| 22 | +# Transform with dbt |
| 23 | +cd dbt_project && dbt run |
54 | 24 | ``` |
55 | 25 |
|
56 | 26 | ## Architecture |
57 | 27 |
|
58 | | -**Extract** → **Load** → **Transform** |
| 28 | +``` |
| 29 | +HyperSync GraphQL API → Parquet Files → PostgreSQL/Snowflake → dbt → Analytics Tables |
| 30 | +``` |
59 | 31 |
|
60 | | -1. **Extract** (`scripts/el/extract_etherscan.py`): Pulls blockchain data from Etherscan API to `.data/raw/*.parquet` |
61 | | -2. **Load** (`scripts/el/load.py`): Loads Parquet files into PostgreSQL/Snowflake `raw` schema |
62 | | -3. **Transform** (`dbt_project/`): dbt models transform raw data into analytics-ready tables |
| 32 | +**Key Components:** |
| 33 | + |
| 34 | +- **Extract**: High-performance GraphQL API (HyperSync) for blockchain data with block-range filtering |
| 35 | +- **Load**: Pluggable loaders for PostgreSQL (dev) and Snowflake (prod) with `append`/`replace`/`merge` modes |
| 36 | +- **Transform**: dbt three-tier modeling (Staging → Intermediate → Marts) with SCD Type 2 support |
| 37 | +- **Migrate**: PostgreSQL to Snowflake data transfer via `pg2sf_raw_transfer.py` |
63 | 38 |
|
64 | 39 | ### Project Structure |
| 40 | + |
65 | 41 | ``` |
66 | | -├── scripts/el/ # Extract & Load scripts |
67 | | -├── src/onchaindata/ # Reusable Python package |
68 | | -│ ├── data_extraction/ # Etherscan/GraphQL clients |
69 | | -│ ├── data_pipeline/ # Loader classes |
70 | | -│ └── utils/ # Database clients |
71 | | -├── dbt_project/ # dbt transformation layer |
72 | | -│ ├── models/01_staging/ # Raw data cleanup (views) |
73 | | -│ ├── models/intermediate/# Business logic (ephemeral) |
74 | | -│ └── models/marts/ # Analytics tables (tables) |
75 | | -└── .data/raw/ # Extracted Parquet files |
| 42 | +scripts/el/ # Extract & Load scripts |
| 43 | +src/onchaindata/ # Python package (extraction, loading, database clients) |
| 44 | +dbt_project/ # dbt models, snapshots, macros |
| 45 | +docs/ # MkDocs documentation |
76 | 46 | ``` |
77 | 47 |
|
78 | | -## Key Features |
| 48 | +## Features |
79 | 49 |
|
80 | | -- **Multi-chain support**: Ethereum, Polygon, BSC via chainid mapping |
81 | | -- **Automatic retry**: Failed extractions retry with smaller chunks (10x reduction) |
82 | | -- **Flexible loading**: PostgreSQL and Snowflake support |
83 | | -- **Block number shortcuts**: Use `18.5M` instead of `18500000` |
84 | | -- **dbt transformations**: Staging → Intermediate → Marts layers |
| 50 | +- **High-Performance Extraction**: HyperSync GraphQL API for fast blockchain data retrieval |
| 51 | +- **Flexible Loading**: PostgreSQL & Snowflake support with multiple write modes |
| 52 | +- **Multi-Chain**: Ethereum, Polygon, BSC via configurable endpoints |
| 53 | +- **SCD Type 2**: Historical tracking for stablecoin metadata via dbt snapshots |
| 54 | +- **Cross-Database Migration**: Seamless PostgreSQL → Snowflake transfers |
85 | 55 |
|
86 | | -## Environment Variables |
| 56 | +## Tech Stack |
87 | 57 |
|
88 | | -Required (see `.env.example`): |
89 | | -- `POSTGRES_*`: Database connection |
90 | | -- `ETHERSCAN_API_KEY`: API access |
91 | | -- `DB_SCHEMA`: Default schema |
| 58 | +Python 3.11+ • SQL • Polars • dlt • PostgreSQL • Snowflake • dbt Core • Docker • uv |
92 | 59 |
|
93 | | -Optional (for Snowflake): |
94 | | -- `SNOWFLAKE_*`: Snowflake connection details |
| 60 | +## Documentation |
95 | 61 |
|
96 | | -## Common Commands |
| 62 | +- **Full Docs**: [https://newgnart.github.io/stables-analytics/](https://newgnart.github.io/stables-analytics/) |
| 63 | +- **Dev Guide**: [CLAUDE.md](CLAUDE.md) |
97 | 64 |
|
98 | | -```bash |
99 | | -# SQL operations |
100 | | -./scripts/sql_pg.sh ./scripts/sql/init.sql |
101 | | - |
102 | | -# dbt operations |
103 | | -./scripts/dbt.sh test # Run tests |
104 | | -./scripts/dbt.sh docs generate # Generate docs |
105 | | -./scripts/dbt.sh run --select staging.* # Run staging models |
106 | | - |
107 | | -# Extract with time range |
108 | | -uv run python scripts/el/extract_etherscan.py \ |
109 | | - -a 0x02950460e2b9529d0e00284a5fa2d7bdf3fa4d72 \ |
110 | | - --logs --transactions \ |
111 | | - --last_n_days 7 |
112 | | -``` |
| 65 | +## Environment Setup |
113 | 66 |
|
114 | | -## Database Schema |
| 67 | +Create `.env` file with database credentials: |
| 68 | +- `POSTGRES_*`: PostgreSQL connection details |
| 69 | +- `SNOWFLAKE_*`: (Optional) Snowflake connection details |
115 | 70 |
|
116 | | -- `raw.logs`: Raw event logs with JSONB topics |
117 | | -- `raw.transactions`: Transaction data |
118 | | -- `staging.stg_logs_decoded`: Decoded logs with parsed topics (topic0-topic3) |
119 | | -- Marts: Analytics tables created by dbt |
| 71 | +See `.env.example` for full configuration. |
120 | 72 |
|
121 | | -## Documentation |
| 73 | +--- |
122 | 74 |
|
123 | | -For detailed documentation, see [CLAUDE.md](CLAUDE.md) or the [docs/](docs/) directory. |
| 75 | +**License**: MIT • Educational capstone project |
0 commit comments